Nitin Venkateswaran


2024

pdf bib
Looking within the self: Investigating the Impact of Data Augmentation with Self-training on Automatic Speech Recognition for Hupa
Nitin Venkateswaran | Zoey Liu
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages

We investigate the performance of state-of-the-art neural ASR systems in transcribing audio recordings for Hupa, a critically endangered language of the Hoopa Valley Tribe. We also explore the impact on ASR performance when augmenting a small dataset of gold-standard high-quality transcriptions with a) a larger dataset with transcriptions of lower quality, and b) model-generated transcriptions in a self-training approach. An evaluation of both data augmentation approaches shows that the self-training approach is competitive, producing better WER scores than models trained with no additional data and not lagging far behind models trained with additional lower quality manual transcriptions instead: the deterioration in WER score is just 4.85 points when all the additional data is used in experiments with the best performing system, Wav2Vec. These findings have encouraging implications on the use of ASR systems for transcription and language documentation efforts in the Hupa language.

2022

pdf bib
MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi
Aryaman Arora | Nitin Venkateswaran | Nathan Schneider
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present a completed, publicly available corpus of annotated semantic relations of adpositions and case markers in Hindi. We used the multilingual SNACS annotation scheme, which has been applied to a variety of typologically diverse languages. Building on past work examining linguistic problems in SNACS annotation, we use language models to attempt automatic labelling of SNACS supersenses in Hindi and achieve results competitive with past work on English. We look towards upstream applications in semantic role labelling and extension to related languages such as Gujarati.

2021

pdf bib
SNACS Annotation of Case Markers and Adpositions in Hindi
Aryaman Arora | Nitin Venkateswaran | Nathan Schneider
Proceedings of the Society for Computation in Linguistics 2021