2025
pdf
bib
abs
JU-CSE-NLP’s Cascaded Speech to Text Translation Systems for IWSLT 2025 in Indic Track
Debjit Dhar
|
Soham Lahiri
|
Tapabrata Mondal
|
Sivaji Bandyopadhyay
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
This paper presents the submission of the Jadavpur University Computer Science and Engineering Natural Language Processing (JU-CSENLP) Laboratory to the International Conference on Spoken Language Translation (IWSLT) 2025 Indic track, addressing the speech-to-text translation task in both English-to-Indic (Bengali, Hindi, Tamil) and Indic-to-English directions. To tackle the challenges posed by low resource Indian languages, we adopt a cascaded approach leveraging state-of-the-art pre-trained models. For English-to-Indic translation, we utilize OpenAI’s Whisper model for Automatic Speech Recognition (ASR), followed by the Meta’s No Language Left Behind (NLLB)-200-distilled-600M model finetuned for Machine Translation (MT). For the reverse direction, we employ the AI4Bharat’s IndicConformer model for ASR and IndicTrans2 finetuned for MT. Our models are fine-tuned on the provided benchmark dataset to better handle the linguistic diversity and domain-specific variations inherent in the data. Evaluation results demonstrate that our cascaded systems achieve competitive performance, with notable BLEU and chrF++ scores across all language pairs. Our findings highlight the effectiveness of combining robust ASR and MT components in a cascaded pipeline, particularly for low-resource and morphologically rich Indian languages.
pdf
bib
abs
Generating and Analyzing Disfluency in a Code-Mixed Setting
Aryan Paul
|
Tapabrata Mondal
|
Dipankar Das
|
Sivaji Bandyopadhyay
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
This work explores the intersection of code-mixing and disfluency in bilingual speech and text, with a focus on understanding how large language models (LLMs) handle code-mixed disfluent utterances. One of the primary objectives is to explore LLMs’ ability to generate code-mixed disfluent sentences and to address the lack of high-quality code-mixed disfluent corpora, particularly for Indic languages. We aim to compare the performance of LLM-based approaches with traditional disfluency detection methods and to develop novel metrics for quantitatively assessing disfluency phenomena. Additionally, we investigate the relationship between code-mixing and disfluency, exploring how factors such as switching frequency and direction influence the occurrence of disfluencies. By analyzing these intriguing dynamics, we seek to gain a deeper understanding of the mutual influence between code-mixing and disfluency in multilingual speech.
2011
pdf
bib
Shared Task System Description: Measuring the Compositionality of Bigrams using Statistical Methodologies
Tanmoy Chakraborty
|
Santanu Pal
|
Tapabrata Mondal
|
Tanik Saikh
|
Sivaju Bandyopadhyay
Proceedings of the Workshop on Distributional Semantics and Compositionality
2010
pdf
bib
English to Indian Languages Machine Transliteration System at NEWS 2010
Amitava Das
|
Tanik Saikh
|
Tapabrata Mondal
|
Asif Ekbal
|
Sivaji Bandyopadhyay
Proceedings of the 2010 Named Entities Workshop
pdf
bib
Automatic Extraction of Complex Predicates in Bengali
Dipankar Das
|
Santanu Pal
|
Tapabrata Mondal
|
Tanmoy Chakraborty
|
Sivaji Bandyopadhyay
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications
pdf
bib
JU_CSE_GREC10: Named Entity Generation at GREC 2010
Amitava Das
|
Tanik Saikh
|
Tapabrata Mondal
|
Sivaji Bandyopadhyay
Proceedings of the 6th International Natural Language Generation Conference
2009
pdf
bib
English to Hindi Machine Transliteration System at NEWS 2009
Amitava Das
|
Asif Ekbal
|
Tapabrata Mondal
|
Sivaji Bandyopadhyay
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
2008
pdf
bib
Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task
Sivaji Bandyopadhyay
|
Tapabrata Mondal
|
Sudip Kumar Naskar
|
Asif Ekbal
|
Rejwanul Haque
|
Srinivasa Rao Godavarthy
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies