Prashanth Nayak

2024

pdf bib abs
The SETU-DCU Submissions to IWSLT 2024 Low-Resource Speech-to-Text Translation Tasks
Maria Zafar | Antonio Castaldo | Prashanth Nayak | Rejwanul Haque | Neha Gajakos | Andy Way
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

Natural Language Processing (NLP) research and development has experienced rapid progression in the recent times due to advances in deep learning. The introduction of pre-trained large language models (LLMs) is at the core of this transformation, significantly enhancing the performance of machine translation (MT) and speech technologies. This development has also led to fundamental changes in modern translation and speech tools and their methodologies. However, there remain challenges when extending this progress to underrepresented dialects and low-resource languages, primarily due to the need for more data. This paper details our submissions to the IWSLT speech translation (ST) tasks. We used the Whisper model for the automatic speech recognition (ASR) component. We then used mBART and NLLB as cascaded systems for utilising their MT capabilities. Our research primarily focused on exploring various dialects of low-resource languages and harnessing existing resources from linguistically related languages. We conducted our experiments for two morphologically diverse language pairs: Irish-to-English and Maltese-to-English. We used BLEU, chrF and COMET for evaluating our MT models.

This system description paper presents SETU-ADAPT’s submission to the WMT 2024 Biomedical Shared Task, where we participated for the language pairs English-to-French and English-to-German. Our approach focused on fine-tuning Large Language Models, using in-domain and synthetic data, employing different data augmentation and data retrieval strategies. We introduce a novel MT framework, involving three autonomous agents: a Translator Agent, an Evaluator Agent and a Reviewer Agent. We present our findings and report the quality of the outputs.

pdf bib abs
The SETU-ADAPT Submissions to the WMT24 Low-Resource Indic Language Translation Task
Neha Gajakos | Prashanth Nayak | Rejwanul Haque | Andy Way
Proceedings of the Ninth Conference on Machine Translation

This paper presents the SETU-ADAPT’s submissions to the WMT 2024 Low-Resource Indic Language Translation task. We participated in the unconstrained segment of the task, focusing on the Assamese-to-English and English-to-Assamese language pairs. Our approach involves leveraging Large Language Models (LLMs) as the baseline systems for all our MT tasks. Furthermore, we applied various strategies to improve the baseline systems. In our first approach, we fine-tuned LLMs using all the data provided by the task organisers. Our second approach explores in-context learning by focusing on few-shot prompting. In our final approach we explore an efficient data extraction technique based on a fuzzy match-based similarity measure for fine-tuning. We evaluated our systems using BLEU, chrF, WER, and COMET. The experimental results showed that our strategies can effectively improve the quality of translations in low-resource scenarios.

pdf bib abs
The SETU-ADAPT Submissions to WMT 2024 Chat Translation Tasks
Maria Zafar | Antonio Castaldo | Prashanth Nayak | Rejwanul Haque | Andy Way
Proceedings of the Ninth Conference on Machine Translation

This paper presents the SETU-ADAPT submissions to the WMT24 Chat Translation Task. Large language models (LLM) currently provides the state-of-the-art solutions in many natural language processing (NLP) problems including machine translation (MT). For the WMT24 Chat Translation Task we leveraged LLMs for their MT capabilities. In order to adapt the LLMs for a specific domain of interest, we explored different fine-tuning and prompting strategies. We also employed efficient data retrieval methods to curate the data used for fine-tuning. We carried out experiments for two language pairs: German-to-English and French-to-English. Our MT models were evaluated using three metrics: BLEU, chrF and COMET. In this paper we describes our experiments including training setups, results and findings.

2023

pdf bib abs
Instance-Based Domain Adaptation for Improving Terminology Translation
Prashanth Nayak | John Kelleher | Rejwanul Haque | Andy Way
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

Terms are essential indicators of a domain, and domain term translation is dealt with priority in any translation workflow. Translation service providers who use machine translation (MT) expect term translation to be unambiguous and consistent with the context and domain in question. Although current state-of-the-art neural MT (NMT) models are able to produce high-quality translations for many languages, they are still not at the level required when it comes to translating domain-specific terms. This study presents a terminology-aware instance- based adaptation method for improving terminology translation in NMT. We conducted our experiments for French-to-English and found that our proposed approach achieves a statistically significant improvement over the baseline NMT system in translating domain-specific terms. Specifically, the translation of multi-word terms is improved by 6.7% compared to the strong baseline.

2020

pdf bib abs
The ADAPT Centre’s Participation in WAT 2020 English-to-Odia Translation Task
Prashanth Nayak | Rejwanul Haque | Andy Way
Proceedings of the 7th Workshop on Asian Translation

This paper describes the ADAPT Centre sub-missions to WAT 2020 for the English-to-Odia translation task. We present the approaches that we followed to try to build competitive machine translation (MT) systems for English-to-Odia. Our approaches include monolingual data selection for creating synthetic data and identifying optimal sets of hyperparameters for the Transformer in a low-resource scenario. Our best MT system produces 4.96BLEU points on the evaluation test set in the English-to-Odia translation task.

Co-authors

John Kelleher 1

Johanna Monti 1

Venues

Fix data