Sivaji Bandyopadhyay

Also published as: Sivaji B, Sivaji Bandopadhyay, Sivaju Bandyopadhyay


2021

pdf bib
Multiple Captions Embellished Multilingual Multi-Modal Neural Machine Translation
Salam Michael Singh | Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

Neural machine translation based on bilingual text with limited training data suffers from lexical diversity, which lowers the rare word translation accuracy and reduces the generalizability of the translation system. In this work, we utilise the multiple captions from the Multi-30K dataset to increase the lexical diversity aided with the cross-lingual transfer of information among the languages in a multilingual setup. In this multilingual and multimodal setting, the inclusion of the visual features boosts the translation quality by a significant margin. Empirical study affirms that our proposed multimodal approach achieves substantial gain in terms of the automatic score and shows robustness in handling the rare word translation in the pretext of English to/from Hindi and Telugu translation tasks.

pdf bib
Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Machine Translation (MMT) tasks, where the captions are short image descriptions, the news captions provide a more detailed description of the contents of the images. As a result, numerous named entities relating to specific persons, locations, etc., are found. In this paper, we acquire two monolingual news datasets reported in English and Hindi paired with the images to generate a synthetic English-Hindi parallel corpus. The parallel corpus is used to train the English-Hindi Neural Machine Translation (NMT) and an English-Hindi MMT system by incorporating the image feature paired with the corresponding parallel corpus. We also conduct a systematic analysis to evaluate the English-Hindi MT systems with 1) more synthetic data and 2) by adding back-translated data. Our finding shows improvement in terms of BLEU scores for both the NMT (+8.05) and MMT (+11.03) systems.

pdf bib
EnKhCorp1.0: An English–Khasi Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English–Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.

pdf bib
Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags
Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Sentiment analysis tools and models have been developed extensively throughout the years, for European languages. In contrast, similar tools for Indian Languages are scarce. This is because, state-of-the-art pre-processing tools like POS tagger, shallow parsers, etc., are not readily available for Indian languages. Although, such working tools for Indian languages, like Hindi and Bengali, that are spoken by the majority of the population, are available, finding the same for less spoken languages like, Tamil, Telugu, and Malayalam, is difficult. Moreover, due to the advent of social media, the multi-lingual population of India, who are comfortable with both English ad their regional language, prefer to communicate by mixing both languages. This gives rise to massive code-mixed content and automatically annotating them with their respective sentiment labels becomes a challenging task. In this work, we take up a similar challenge of developing a sentiment analysis model that can work with English-Tamil code-mixed data. The proposed work tries to solve this by using bi-directional LSTMs along with language tagging. Other traditional methods, based on classical machine learning algorithms have also been discussed in the literature, and they also act as the baseline systems to which we will compare our Neural Network based model. The performance of the developed algorithm, based on Neural Network architecture, garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.

pdf bib
Improved English to Hindi Multimodal Neural Machine Translation
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

Machine translation performs automatic translation from one natural language to another. Neural machine translation attains a state-of-the-art approach in machine translation, but it requires adequate training data, which is a severe problem for low-resource language pairs translation. The concept of multimodal is introduced in neural machine translation (NMT) by merging textual features with visual features to improve low-resource pair translation. WAT2021 (Workshop on Asian Translation 2021) organizes a shared task of multimodal translation for English to Hindi. We have participated the same with team name CNLP-NITS-PP in two submissions: multimodal and text-only NMT. This work investigates phrase pairs injection via data augmentation approach and attains improvement over our previous work at WAT2020 on the same task in both text-only and multimodal NMT. We have achieved second rank on the challenge test set for English to Hindi multimodal translation where Bilingual Evaluation Understudy (BLEU) score of 39.28, Rank-based Intuitive Bilingual Evaluation Score (RIBES) 0.792097, and Adequacy-Fluency Metrics (AMFM) score 0.830230 respectively.

2020

pdf bib
Multimodal Neural Machine Translation for English to Hindi
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 7th Workshop on Asian Translation

Machine translation (MT) focuses on the automatic translation of text from one natural language to another natural language. Neural machine translation (NMT) achieves state-of-the-art results in the task of machine translation because of utilizing advanced deep learning techniques and handles issues like long-term dependency, and context-analysis. Nevertheless, NMT still suffers low translation quality for low resource languages. To encounter this challenge, the multi-modal concept comes in. The multi-modal concept combines textual and visual features to improve the translation quality of low resource languages. Moreover, the utilization of monolingual data in the pre-training step can improve the performance of the system for low resource language translations. Workshop on Asian Translation 2020 (WAT2020) organized a translation task for multimodal translation in English to Hindi. We have participated in the same in two-track submission, namely text-only and multi-modal translation with team name CNLP-NITS. The evaluated results are declared at the WAT2020 translation task, which reports that our multi-modal NMT system attained higher scores than our text-only NMT on both challenge and evaluation test set. For the challenge test data, our multi-modal neural machine translation system achieves Bilingual Evaluation Understudy (BLEU) score of 33.57, Rank-based Intuitive Bilingual Evaluation Score (RIBES) 0.754141, Adequacy-Fluency Metrics (AMFM) score 0.787320 and for evaluation test data, BLEU, RIBES, and, AMFM score of 40.51, 0.803208, and 0.820980 for English to Hindi translation respectively.

pdf bib
Hindi-Marathi Cross Lingual Model
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Fifth Conference on Machine Translation

Machine Translation (MT) is a vital tool for aiding communication between linguistically separate groups of people. The neural machine translation (NMT) based approaches have gained widespread acceptance because of its outstanding performance. We have participated in WMT20 shared task of similar language translation on Hindi-Marathi pair. The main challenge of this task is by utilization of monolingual data and similarity features of similar language pair to overcome the limitation of available parallel data. In this work, we have implemented NMT based model that simultaneously learns bilingual embedding from both the source and target language pairs. Our model has achieved Hindi to Marathi bilingual evaluation understudy (BLEU) score of 11.59, rank-based intuitive bilingual evaluation score (RIBES) score of 57.76 and translation edit rate (TER) score of 79.07 and Marathi to Hindi BLEU score of 15.44, RIBES score of 61.13 and TER score of 75.96.

pdf bib
The NITS-CNLP System for the Unsupervised MT Task at WMT 2020
Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the Fifth Conference on Machine Translation

We describe NITS-CNLP’s submission to WMT 2020 unsupervised machine translation shared task for German language (de) to Upper Sorbian (hsb) in a constrained setting i.e, using only the data provided by the organizers. We train our unsupervised model using monolingual data from both the languages by jointly pre-training the encoder and decoder and fine-tune using backtranslation loss. The final model uses the source side (de) monolingual data and the target side (hsb) synthetic data as a pseudo-parallel data to train a pseudo-supervised system which is tuned using the provided development set(dev set).

pdf bib
English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay | Mihaela Vela | Josef van Genabith
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

We present the first study on the post-editing (PE) effort required to build a parallel dataset for English-Manipuri and English-Mizo, in the context of a project on creating data for machine translation (MT). English source text from a local daily newspaper are machine translated into Manipuri and Mizo using PBSMT systems built in-house. A Computer Assisted Translation (CAT) tool is used to record the time, keystroke and other indicators to measure PE effort in terms of temporal and technical effort. A positive correlation between the technical effort and the number of function words is seen for English-Manipuri and English-Mizo but a negative correlation between the technical effort and the number of noun words for English-Mizo. However, average time spent per token in PE English-Mizo text is negatively correlated with the temporal effort. The main reason for these results are due to (i) English and Mizo using the same script, while Manipuri uses a different script and (ii) the agglutinative nature of Manipuri. Further, we check the impact of training a MT system in an incremental approach, by including the post-edited dataset as additional training data. The result shows an increase in HBLEU of up to 4.6 for English-Manipuri.

pdf bib
JUNLP@ICON2020: Low Resourced Machine Translation for Indic Languages
Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task

In the current work, we present the description of the systems submitted to a machine translation shared task organized by ICON 2020: 17th International Conference on Natural Language Processing. The systems were developed to show the capability of general domain machine translation when translating into Indic languages, English-Hindi, in our case. The paper shows the training process and quantifies the performance of two state-of-the-art translation systems, viz., Statistical Machine Translation and Neural Machine Translation. While Statistical Machine Translation systems work better in a low-resource setting, Neural Machine Translation systems are able to generate sentences that are fluent in nature. Since both these systems have contrasting advantages, a hybrid system, incorporating both, was also developed to leverage all the strong points. The submitted systems garnered BLEU scores of 8.701943312, 0.6361336198, and 11.78873307 respectively and the scores of the hybrid system helped us to the fourth spot in the competition leaderboard.

pdf bib
Zero-Shot Neural Machine Translation: Russian-Hindi @LoResMT 2020
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Neural machine translation (NMT) is a widely accepted approach in the machine translation (MT) community, translating from one natural language to another natural language. Although, NMT shows remarkable performance in both high and low resource languages, it needs sufficient training corpus. The availability of a parallel corpus in low resource language pairs is one of the challenging tasks in MT. To mitigate this issue, NMT attempts to utilize a monolingual corpus to get better at translation for low resource language pairs. Workshop on Technologies for MT of Low Resource Languages (LoResMT 2020) organized shared tasks of low resource language pair translation using zero-shot NMT. Here, the parallel corpus is not used and only monolingual corpora is allowed. We have participated in the same shared task with our team name CNLP-NITS for the Russian-Hindi language pair. We have used masked sequence to sequence pre-training for language generation (MASS) with only monolingual corpus following the unsupervised NMT architecture. The evaluated results are declared at the LoResMT 2020 shared task, which reports that our system achieves the bilingual evaluation understudy (BLEU) score of 0.59, precision score of 3.43, recall score of 5.48, F-measure score of 4.22, and rank-based intuitive bilingual evaluation score (RIBES) of 0.180147 in Russian to Hindi translation. And for Hindi to Russian translation, we have achieved BLEU, precision, recall, F-measure, and RIBES score of 1.11, 4.72, 4.41, 4.56, and 0.026842 respectively.

pdf bib
EnAsCorp1.0: English-Assamese Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

The corpus preparation is one of the important challenging task for the domain of machine translation especially in low resource language scenarios. Country like India where multiple languages exists, machine translation attempts to minimize the communication gap among people with different linguistic backgrounds. Although Google Translation covers automatic translation of various languages all over the world but it lags in some languages including Assamese. In this paper, we have developed EnAsCorp1.0, corpus of English-Assamese low resource pair where parallel and monolingual data are collected from various online sources. We have also implemented baseline systems with statistical machine translation and neural machine translation approaches for the same corpus.

2019

pdf bib
Development of POS tagger for English-Bengali Code-Mixed data
Tathagata Raha | Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 16th International Conference on Natural Language Processing

Code-mixed texts are widespread nowadays due to the advent of social media. Since these texts combine two languages to formulate a sentence, it gives rise to various research problems related to Natural Language Processing. In this paper, we try to excavate one such problem, namely, Parts of Speech tagging of code-mixed texts. We have built a system that can POS tag English-Bengali code-mixed data where the Bengali words were written in Roman script. Our approach initially involves the collection and cleaning of English-Bengali code-mixed tweets. These tweets were used as a development dataset for building our system. The proposed system is a modular approach that starts by tagging individual tokens with their respective languages and then passes them to different POS taggers, designed for different languages (English and Bengali, in our case). Tags given by the two systems are later joined together and the final result is then mapped to a universal POS tag set. Our system was checked using 100 manually POS tagged code-mixed sentences and it returned an accuracy of 75.29%.

pdf bib
English to Hindi Multi-modal Neural Machine Translation and Hindi Image Captioning
Sahinur Rahman Laskar | Rohit Pratap Singh | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Translation

With the widespread use of Machine Trans-lation (MT) techniques, attempt to minimizecommunication gap among people from di-verse linguistic backgrounds. We have par-ticipated in Workshop on Asian Transla-tion 2019 (WAT2019) multi-modal translationtask. There are three types of submissiontrack namely, multi-modal translation, Hindi-only image captioning and text-only transla-tion for English to Hindi translation. The mainchallenge is to provide a precise MT output.The multi-modal concept incorporates textualand visual features in the translation task. Inthis work, multi-modal translation track re-lies on pre-trained convolutional neural net-works (CNN) with Visual Geometry Grouphaving 19 layered (VGG19) to extract imagefeatures and attention-based Neural MachineTranslation (NMT) system for translation.The merge-model of recurrent neural network(RNN) and CNN is used for the Hindi-onlyimage captioning. The text-only translationtrack is based on the transformer model of theNMT system. The official results evaluated atWAT2019 translation task, which shows thatour multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20.37, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.55,RIBES 0.760080, AMFM score 0.770860 forevaluation test data in English to Hindi multi-modal translation respectively.

pdf bib
WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Translation

A multimodal translation is a task of translating a source language to a target language with the help of a parallel text corpus paired with images that represent the contextual details of the text. In this paper, we carried out an extensive comparison to evaluate the benefits of using a multimodal approach on translating text in English to a low resource language, Hindi as a part of WAT2019 shared task. We carried out the translation of English to Hindi in three separate tasks with both the evaluation and challenge dataset. First, by using only the parallel text corpora, then through an image caption generation approach and, finally with the multimodal approach. Our experiment shows a significant improvement in the result with the multimodal approach than the other approach.

pdf bib
JUMT at WMT2019 News Translation Task: A Hybrid Approach to Machine Translation for Lithuanian to English
Sainik Kumar Mahata | Avishek Garain | Adityar Rayala | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In the current work, we present a description of the system submitted to WMT 2019 News Translation Shared task. The system was created to translate news text from Lithuanian to English. To accomplish the given task, our system used a Word Embedding based Neural Machine Translation model to post edit the outputs generated by a Statistical Machine Translation model. The current paper documents the architecture of our model, descriptions of the various modules and the results produced using the same. Our system garnered a BLEU score of 17.6.

pdf bib
Neural Machine Translation: Hindi-Nepali
Sahinur Rahman Laskar | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

With the extensive use of Machine Translation (MT) technology, there is progressively interest in directly translating between pairs of similar languages. Because the main challenge is to overcome the limitation of available parallel data to produce a precise MT output. Current work relies on the Neural Machine Translation (NMT) with attention mechanism for the similar language translation of WMT19 shared task in the context of Hindi-Nepali pair. The NMT systems trained the Hindi-Nepali parallel corpus and tested, analyzed in Hindi ⇔ Nepali translation. The official result declared at WMT19 shared task, which shows that our NMT system obtained Bilingual Evaluation Understudy (BLEU) score 24.6 for primary configuration in Nepali to Hindi translation. Also, we have achieved BLEU score 53.7 (Hindi to Nepali) and 49.1 (Nepali to Hindi) in contrastive system type.

2018

pdf bib
WME 3.0: An Enhanced and Validated Lexicon of Medical Concepts
Anupam Mondal | Dipankar Das | Erik Cambria | Sivaji Bandyopadhyay
Proceedings of the 9th Global Wordnet Conference

Information extraction in the medical domain is laborious and time-consuming due to the insufficient number of domain-specific lexicons and lack of involvement of domain experts such as doctors and medical practitioners. Thus, in the present work, we are motivated to design a new lexicon, WME 3.0 (WordNet of Medical Events), which contains over 10,000 medical concepts along with their part of speech, gloss (descriptive explanations), polarity score, sentiment, similar sentiment words, category, affinity score and gravity score features. In addition, the manual annotators help to validate the overall as well as individual category level of medical concepts of WME 3.0 using Cohen’s Kappa agreement metric. The agreement score indicates almost correct identification of medical concepts and their assigned features in WME 3.0.

pdf bib
JUCBNMT at WMT2018 News Translation Task: Character Based Neural Machine Translation of Finnish to English
Sainik Kumar Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

In the current work, we present a description of the system submitted to WMT 2018 News Translation Shared task. The system was created to translate news text from Finnish to English. The system used a Character Based Neural Machine Translation model to accomplish the given task. The current paper documents the preprocessing steps, the description of the submitted system and the results produced using the same. Our system garnered a BLEU score of 12.9.

2017

pdf bib
BUCC2017: A Hybrid Approach for Identifying Parallel Sentences in Comparable Corpora
Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

A Statistical Machine Translation (SMT) system is always trained using large parallel corpus to produce effective translation. Not only is the corpus scarce, it also involves a lot of manual labor and cost. Parallel corpus can be prepared by employing comparable corpora where a pair of corpora is in two different languages pointing to the same domain. In the present work, we try to build a parallel corpus for French-English language pair from a given comparable corpus. The data and the problem set are provided as part of the shared task organized by BUCC 2017. We have proposed a system that first translates the sentences by heavily relying on Moses and then group the sentences based on sentence length similarity. Finally, the one to one sentence selection was done based on Cosine Similarity algorithm.

pdf bib
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)
Sivaji Bandyopadhyay
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib
Relationship Extraction based on Category of Medical Concepts from Lexical Contexts
Anupam Mondal | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib
Retrieving Similar Lyrics for Music Recommendation System
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf bib
JU-USAAR: A Domain Adaptive MT System
Koushik Pahari | Alapan Kuila | Santanu Pal | Sudip Kumar Naskar | Sivaji Bandyopadhyay | Josef van Genabith
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Genetic Algorithm (GA) Implementation for Feature Selection in Manipuri POS Tagging
Kishorjit Nongmeikapam | Sivaji Bandyopadhyay
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib
Statistical Natural Language Generation from Tabular Non-textual Data
Joy Mahapatra | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the 9th International Natural Language Generation conference

pdf bib
WME: Sense, Polarity and Affinity based Concept Resource for Medical Events
Anupam Mondal | Dipankar Das | Erik Cambria | Sivaji Bandyopadhyay
Proceedings of the 8th Global WordNet Conference (GWC)

In order to overcome the lack of medical corpora, we have developed a WordNet for Medical Events (WME) for identifying medical terms and their sense related information using a seed list. The initial WME resource contains 1654 medical terms or concepts. In the present research, we have reported the enhancement of WME with 6415 number of medical concepts along with their conceptual features viz. Parts-of-Speech (POS), gloss, semantics, polarity, sense and affinity. Several polarity lexicons viz. SentiWordNet, SenticNet, Bing Liu’s subjectivity list and Taboda’s adjective list were introduced with WordNet synonyms and hyponyms for expansion. The semantics feature guided us to build a semantic co-reference relation based network between the related medical concepts. These features help to prepare a medical concept network for better sense relation based visualization. Finally, we evaluated with respect to Adaptive Lesk Algorithm and conducted an agreement analysis for validating the expanded WME resource.

pdf bib
JU_NLP at SemEval-2016 Task 6: Detecting Stance in Tweets using Support Vector Machines
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence
Niloy Mukherjee | Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Multimodal Mood Classification - A Case Study of Differences in Hindi and Western Songs
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Music information retrieval has emerged as a mainstream research area in the past two decades. Experiments on music mood classification have been performed mainly on Western music based on audio, lyrics and a combination of both. Unfortunately, due to the scarcity of digitalized resources, Indian music fares poorly in music mood retrieval research. In this paper, we identified the mood taxonomy and prepared multimodal mood annotated datasets for Hindi and Western songs. We identified important audio and lyric features using correlation based feature selection technique. Finally, we developed mood classification systems using Support Vector Machines and Feed Forward Neural Networks based on the features collected from audio, lyrics, and a combination of both. The best performing multimodal systems achieved F-measures of 75.1 and 83.5 for classifying the moods of the Hindi and Western songs respectively using Feed Forward Neural Networks. A comparative analysis indicates that the selected features work well for mood classification of the Western songs and produces better results as compared to the mood classification systems for Hindi songs.

2015

pdf bib
Mood Classification of Hindi Songs based on Lyrics
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 12th International Conference on Natural Language Processing

2014

pdf bib
JU_CSE: A Conditional Random Field (CRF) Based Approach to Aspect Based Sentiment Analysis
Braja Gopal Patra | Soumik Mandal | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Word Alignment-Based Reordering of Source Chunks in PB-SMT
Santanu Pal | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target language ordering. Prior reordering of the source chunks is performed in the present work by following the target word order suggested by word alignment. The testset is reordered using monolingual MT trained on source and reordered source. This approach of prior reordering of the source chunks was compared with pre-ordering of source words based on word alignments and the traditional approach of prior source reordering based on language-pair specific reordering rules. The effects of these reordering approaches were studied on an English--Bengali translation task, a language pair with different word order. From the experimental results it was found that word alignment based reordering of the source chunks is more effective than the other reordering approaches, and it produces statistically significant improvements over the baseline system on BLEU. On manual inspection we found significant improvements in terms of word alignments.

pdf bib
How Sentiment Analysis Can Help Machine Translation
Santanu Pal | Braja Gopal Patra | Dipankar Das | Sudip Kumar Naskar | Sivaji Bandyopadhyay | Josef van Genabith
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
Manipuri Chunking: An Incremental Model with POS and RMWE
Kishorjit Nongmeikapam | Thiyam Ibungomacha Singh | Ngariyanbam Mayekleima Chanu | Sivaji Bandyopadhyay
Proceedings of the 11th International Conference on Natural Language Processing

2013

pdf bib
JU_CSE: A CRF Based Approach to Annotation of Temporal Expression, Event and Temporal Relations
Anup Kumar Kolya | Amitava Kundu | Rajdeep Gupta | Asif Ekbal | Sivaji Bandyopadhyay
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Construction of Emotional Lexicon Using Potts Model
Braja Gopal Patra | Hiroya Takamura | Dipankar Das | Manabu Okumura | Sivaji Bandyopadhyay
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
An Empirical Study of Combing Multiple Models in Bengali Question Classification
Somnath Banerjee | Sivaji Bandyopadhyay
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
Rajdeep Gupta | Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation
Santanu Pal | Sudip Naskar | Sivaji Bandyopadhyay
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology
Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Automatic Music Mood Classification of Hindi Songs
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Event and Event Actor Alignment in Phrase Based Statistical Machine Translation
Anup Kolya | Santanu Pal | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 11th Workshop on Asian Language Resources

pdf bib
On Application of Conditional Random Field in Stemming of Bengali Natural Language Text
Sandipan Sarkar | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

pdf bib
MWE Alignment in Phrase Based Statistical Machine Translation
Santanu Pal | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Emotion Co-referencing - Emotional Expression, Holder, and Topic
Dipankar Das | Sivaji Bandyopadhyay
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 1, March 2013

2012

pdf bib
A Light Weight Stemmer in Kokborok
Braja Gopal Patra | Khumbar Debbarma | Swapan Debbarma | Dipankar Das | Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf bib
Bootstrapping Method for Chunk Alignment in Phrase Based SMT
Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Detection and Correction of Preposition and Determiner Errors in English: HOO 2012
Pinaki Bhaskar | Aniruddha Ghosh | Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Bengali Question Classification: Towards Developing QA System
Somnath Banerjee | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Morphological Analyzer for Kokborok
Khumbar Debbarma | Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Manipuri Morpheme Identification
Kishorjit Nongmeikapam | Vidya Raj RK | Nirmal Y | Sivaji B
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology
Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Classification of Interviews - A Case Study on Cancer Patients
Braja Gopal Patra | Amitava Kundu | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Question Classification and Answering from Procedural Text in English
Somnath Banerjee | Sivaji Bandyopadhyay
Proceedings of the Workshop on Question Answering for Complex Domains

pdf bib
Part of Speech (POS) Tagger for Kokborok
Braja Gopal Patra | Khumbar Debbarma | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of COLING 2012: Posters

pdf bib
Keyphrase Extraction in Scientific Articles: A Supervised Approach
Pinaki Bhaskar | Kishorjit Nongmeikapam | Sivaji Bandyopadhyay
Proceedings of COLING 2012: Demonstration Papers

pdf bib
JU_CSE_NLP: Multi-grade Classification of Semantic Similarity between Text Pairs
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
JU_CSE_NLP: Language Independent Cross-lingual Textual Entailment System
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
A Hybrid Approach for Event Extraction and Event Actor Identification
Anup Kumar Kolya | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Integration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Handling Multiword Expressions in Phrase-Based Statistical Machine Translation
Santanu Pal | Tanmoy Chakraborty | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Dr Sentiment Knows Everything!
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
Semantic Clustering: an Attempt to Identify Multiword Expressions in Bengali
Tanmoy Chakraborty | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

pdf bib
Identifying Event-Sentiment Association using Lexical Equivalence and Co-reference Approaches
Anup Kolya | Dipankar Das | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics

pdf bib
Shared Task System Description: Measuring the Compositionality of Bigrams using Statistical Methodologies
Tanmoy Chakraborty | Santanu Pal | Tapabrata Mondal | Tanik Saikh | Sivaju Bandyopadhyay
Proceedings of the Workshop on Distributional Semantics and Compositionality

pdf bib
Developing Japanese WordNet Affect for Analyzing Emotions
Yoshimitsu Torii | Dipankar Das | Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf bib
May I check the English of your paper!!!
Pinaki Bhaskar | Aniruddha Ghosh | Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
A Rule Based Approach for Analysis of Comparative or Evaluative Questions in Tourism Domain
Bidhan Chandra Pal | Pinaki Bhaskar | Sivaji Bandyopadhyay
Proceedings of the KRAQ11 workshop

pdf bib
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)
Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

pdf bib
Analyzing Emotional Statements – Roles of General and Physiological Variables
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

2010

pdf bib
JU: A Supervised Approach to Identify Semantic Relations from Paired Nominals
Santanu Pal | Partha Pakray | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
JU_CSE_TEMP: A First Step towards Evaluating Events, Time Expressions and Temporal Relations
Anup Kumar Kolya | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Identifying Emotional Expressions, Intensities and Sentence Level Emotion Tags Using a Supervised Framework
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
A Supervised Machine Learning Approach for Event-Event Relation Identification
Anup Kumar Kolya | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
A Query Focused Multi Document Automatic Summarization
Pinaki Bhaskar | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Finding Emotion Holder from Bengali Blog Texts—An Unsupervised Syntactic Approach
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Towards the Global SentiWordNet
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Topic-Based Bengali Opinion Summarization
Amitava Das | Sivaji Bandyopadhyay
Coling 2010: Posters

pdf bib
English to Indian Languages Machine Transliteration System at NEWS 2010
Amitava Das | Tanik Saikh | Tapabrata Mondal | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 2010 Named Entities Workshop

pdf bib
Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
SentiWordNet for Indian Languages
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
SemanticNet-Perception of Human Pragmatics
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon

pdf bib
Clause Identification and Classification in Bengali
Aniruddha Ghosh | Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Automatic Extraction of Complex Predicates in Bengali
Dipankar Das | Santanu Pal | Tapabrata Mondal | Tanmoy Chakraborty | Sivaji Bandyopadhyay
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation
Santanu Pal | Sudip Kumar Naskar | Pavel Pecina | Sivaji Bandyopadhyay | Andy Way
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule Based Approach
Tanmoy Chakraborty | Sivaji Bandyopadhyay
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Manipuri-English Bidirectional Statistical Machine Translation Systems using Morphology and Dependency Relations
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf bib
JU_CSE_GREC10: Named Entity Generation at GREC 2010
Amitava Das | Tanik Saikh | Tapabrata Mondal | Sivaji Bandyopadhyay
Proceedings of the 6th International Natural Language Generation Conference

pdf bib
Discerning Emotions of Bloggers based on Topics – a Supervised Coreference Approach in Bengali
Dipankar Das | Sivaji Bandyopadhyay
ROCLING 2010 Poster Papers

2009

pdf bib
Word to Sentence Level Emotion Tagging for Bengali Blogs
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Voted Approach for Part of Speech Tagging in Bengali
Asif Ekbal | Md. Hasanuzzaman | Sivaji Bandyopadhyay
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Named Entity Recognition for Manipuri Using Support Vector Machine
Thoudam Doren Singh | Kishorjit Nongmeikapam | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)
Sivaji Bandyopadhyay | Pushpak Bhattacharyya | Vasudeva Varma | Sudeshna Sarkar | A Kumaran | Raghavendra Udupa
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)

pdf bib
JUNLG-MSR: A Machine Learning Approach of Main Subject Reference Selection with Rule Based Improvement
Samir Gupta | Sivaji Bandopadhyay
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

pdf bib
Bengali Verb Subcategorization Frame Acquisition - A Baseline Model
Somnath Banerjee | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
English to Hindi Machine Transliteration System at NEWS 2009
Amitava Das | Asif Ekbal | Tapabrata Mondal | Sivaji Bandyopadhyay
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Voted NER System using Appropriate Unlabeled Data
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2008

pdf bib
Named Entity Recognition in Bengali: A Conditional Random Field Approach
Asif Ekbal | Rejwanul Haque | Sivaji Bandyopadhyay
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Generation of Referring Expression Using Prefix Tree Structure
Sibabrata Paladhi | Sivaji Bandyopadhyay
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Design of a Rule-based Stemmer for Natural Language Text in Bengali
Sandipan Sarkar | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Morphology Driven Manipuri POS Tagger
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Invited Talk: Multilingual Named Entity Recognition
Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
Language Independent Named Entity Recognition in Indian Languages
Asif Ekbal | Rejwanul Haque | Amitava Das | Venkateswarlu Poka | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
Bengali Named Entity Recognition Using Support Vector Machine
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
A Document Graph Based Query Focused Multi-Document Summarizer
Sibabrata Paladhi | Sivaji Bandyopadhyay
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

pdf bib
Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task
Sivaji Bandyopadhyay | Tapabrata Mondal | Sudip Kumar Naskar | Asif Ekbal | Rejwanul Haque | Srinivasa Rao Godavarthy
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

pdf bib
Development of Bengali Named Entity Tagged Corpus and its Use in NER Systems
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
Multi-Engine Approach for Named Entity Recognition in Bengali
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

pdf bib
JU-PTBSGRE: GRE Using Prefix Tree Based Structure
Sibabrata Paladhi | Sivaji Bandyopadhyay
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization
Sivaji Bandyopadhyay | Thierry Poibeau | Horacio Saggion | Roman Yangarber
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization

2007

pdf bib
JU-SKNSB: Extended WordNet Based WSD on the English All-Words Task at SemEval-1
Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
A Modified Joint Source-Channel Model for Transliteration
Asif Ekbal | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Dialogue based Question Answering System in Telugu
Rami Reddy | Nandi Reddy | Sivaji Bandyopadhyay
Proceedings of the Workshop on Multilingual Question Answering - MLQA ‘06

pdf bib
Handling of Prepositions in English to Bengali Machine Translation
Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the Third ACL-SIGSEM Workshop on Prepositions

2005

pdf bib
A Phrasal EBMT System for Translating English to Bengali
Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit X: Posters

The present work describes a Phrasal Example Based Machine Translation system from English to Bengali that identifies the phrases in the input through a shallow analysis, retrieves the target phrases using a Phrasal Example base and finally combines the target language phrases employing some heuristics based on the phrase ordering rules for Bengali. The paper focuses on the structure of the noun, verb and prepositional phrases in English and how these phrases are realized in Bengali. This study has an effect on the design of the phrasal Example Base and recombination rules for the target language phrases.

pdf bib
Use of Machine Translation in India: Current Status
Sudip Naskar | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit X: Posters

A survey of the machine translation systems that have been developed in India for translation from English to Indian languages and among Indian languages reveals that the MT softwares are used in field testing or are available as web translation service. These systems are also used for teaching machine translation to the students and researchers. Most of these systems are in the English-Hindi or Indian language-Indian language domain. The translation domains are mostly government documents/reports and news stories. There are a number of other MT systems that are at their various phases of development and have been demonstrated at various forums. Many of these systems cover other Indian languages beside Hindi.

pdf bib
A Semantics-based English-Bengali EBMT System for Translating News Headlines
Diganta Saha | Sivaji Bandyopadhyay
Workshop on example-based machine translation

The paper reports an Example based Machine Translation System for translating News Headlines from English to Bengali. The input headline is initially searched in the Direct Example Base. If it cannot be found, the input headline is tagged and the tagged headline is searched in the Generalized Tagged Example Base. If a match is obtained, the tagged headline in Bengali is retrieved from the example base, the output Bengali headline is generated after retrieving the Bengali equivalents of the English words from appropriate dictionaries and then applying relevant synthesis rules for generating the Bengali surface level words. If some named entities and acronyms are not present in the dictionary, transliteration scheme is applied for obtaining the Bengali equivalent. If a match is not found, the tagged input headline is analysed to identify the constituent phrase(s). The target translation is generated using English-Bengali phrasal example base, appropriate dictionaries and a set of heuristics for Bengali phrase reordering. If the headline still cannot be translated using example base strategy, a heuristic translation strategy will be applied. Any new input tagged headline along with its translation by the user will be inserted in the tagged Example base after generalization.

2002

pdf bib
Teaching MT - an Indian pespective
Sivaji Bandyopadhyay
Proceedings of the 6th EAMT Workshop: Teaching Machine Translation

2000

pdf bib
An example-based MT system in news items domain from English to Indian languages
Sivaji Bandyopadhyay
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib
Detection and Correction of Phonetic Errors with a New Orthographic Dictionary
Sivaji Bandyopadhyay
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation