Nishtha Jain


2023

pdf bib
Using MT for multilingual covid-19 case load prediction from social media texts
Maja Popovic | Vasudevan Nedumpozhimana | Meegan Gower | Sneha Rautmare | Nishtha Jain | John Kelleher
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

In the context of an epidemiological study involving multilingual social media, this paper reports on the ability of machine translation systems to preserve content relevant for a document classification task designed to determine whether the social media text is related to covid. The results indicate that machine translation does provide a feasible basis for scaling epidemiological social media surveillance to multiple languages. Moreover, a qualitative error analysis revealed that the majority of classification errors are not caused by MT errors.

pdf bib
Medical Concept Mention Identification in Social Media Posts Using a Small Number of Sample References
Vasudevan Nedumpozhimana | Sneha Rautmare | Meegan Gower | Nishtha Jain | Maja Popović | Patricia Buffini | John Kelleher
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Identification of mentions of medical concepts in social media text can provide useful information for caseload prediction of diseases like Covid-19 and Measles. We propose a simple model for the automatic identification of the medical concept mentions in the social media text. We validate the effectiveness of the proposed model on Twitter, Reddit, and News/Media datasets.

2022

pdf bib
Leveraging Pre-trained Language Models for Gender Debiasing
Nishtha Jain | Declan Groves | Lucia Specia | Maja Popović
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Studying and mitigating gender and other biases in natural language have become important areas of research from both algorithmic and data perspectives. This paper explores the idea of reducing gender bias in a language generation context by generating gender variants of sentences. Previous work in this field has either been rule-based or required large amounts of gender balanced training data. These approaches are however not scalable across multiple languages, as creating data or rules for each language is costly and time-consuming. This work explores a light-weight method to generate gender variants for a given text using pre-trained language models as the resource, without any task-specific labelled data. The approach is designed to work on multiple languages with minimal changes in the form of heuristics. To showcase that, we have tested it on a high-resourced language, namely Spanish, and a low-resourced language from a different family, namely Serbian. The approach proved to work very well on Spanish, and while the results were less positive for Serbian, it showed potential even for languages where pre-trained models are less effective.

2021

pdf bib
Generating Gender Augmented Data for NLP
Nishtha Jain | Maja Popović | Declan Groves | Eva Vanmassenhove
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable re-writing approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for creating gender balanced training data. The proposed approach is based on a neural machine translation system trained to ‘translate’ from one gender alternative to another. Both the automatic and manual analysis of the approach show promising results with respect to the automatic generation of gender alternatives for conversational sentences in Spanish.