Alok Singh


2024

pdf bib
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)
Dominik Stammbach | Jingwei Ni | Tobias Schimanski | Kalyan Dutia | Alok Singh | Julia Bingler | Christophe Christiaen | Neetu Kushwaha | Veruska Muccione | Saeid A. Vaghefi | Markus Leippold
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

2021

pdf bib
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting
Loitongbam Sanayai Meetei | Laishram Rahul | Alok Singh | Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In this paper, we report the experimental findings of building Speech-to-Text translation systems for Manipuri-English on low resource setting which is first of its kind in this language pair. For this purpose, a new dataset consisting of a Manipuri-English parallel corpus along with the corresponding audio version of the Manipuri text is built. Based on this dataset, a benchmark evaluation is reported for the Manipuri-English Speech-to-Text translation using two approaches: 1) a pipeline model consisting of ASR (Automatic Speech Recognition) and Machine translation, and 2) an end-to-end Speech-to-Text translation. Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) and Time delay neural network (TDNN) Acoustic models are used to build two different pipeline systems using a shared MT system. Experimental result shows that the TDNN model outperforms GMM-HMM model significantly by a margin of 2.53% WER. However, their evaluation of Speech-to-Text translation differs by a small margin of 0.1 BLEU. Both the pipeline translation models outperform the end-to-end translation model by a margin of 2.6 BLEU score.

pdf bib
On the Transferability of Massively Multilingual Pretrained Models in the Pretext of the Indo-Aryan and Tibeto-Burman Languages
Salam Michael Singh | Loitongbam Sanayai Meetei | Alok Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In recent times, machine translation models can learn to perform implicit bridging between language pairs never seen explicitly during training and showing that transfer learning helps for languages with constrained resources. This work investigates the low resource machine translation via transfer learning from multilingual pre-trained models i.e. mBART-50 and mT5-base in the pretext of Indo-Aryan (Assamese and Bengali) and Tibeto-Burman (Manipuri) languages via finetuning as a downstream task. Assamese and Manipuri were absent in the pretraining of both mBART-50 and the mT5 models. However, the experimental results attest that the finetuning from these pre-trained models surpasses the multilingual model trained from scratch.

pdf bib
An Efficient Keyframes Selection Based Framework for Video Captioning
Alok Singh | Loitongbam Sanayai Meetei | Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Describing a video is a challenging yet attractive task since it falls into the intersection of computer vision and natural language generation. The attention-based models have reported the best performance. However, all these models follow similar procedures, such as segmenting videos into chunks of frames or sampling frames at equal intervals for visual encoding. The process of segmenting video into chunks or sampling frames at equal intervals causes encoding of redundant visual information and requires additional computational cost since a video consists of a sequence of similar frames and suffers from inescapable noise such as uneven illumination, occlusion and motion effects. In this paper, a boundary-based keyframes selection approach for video description is proposed that allow the system to select a compact subset of keyframes to encode the visual information and generate a description for a video without much degradation. The proposed approach uses 3 4 frames per video and yields competitive performance over two benchmark datasets MSVD and MSR-VTT (in both English and Hindi).