Christine Basta


2022

pdf bib
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Christian Hardmeier | Christine Basta | Marta R. Costa-jussà | Gabriel Stanovsky | Hila Gonen
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

pdf bib
Evaluating Gender Bias in Speech Translation
Marta R. Costa-jussà | Christine Basta | Gerard I. Gállego
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The scientific community is increasingly aware of the necessity to embrace pluralism and consistently represent major and minor social groups. Currently, there are no standard evaluation techniques for different types of biases. Accordingly, there is an urgent need to provide evaluation sets and protocols to measure existing biases in our automatic systems. Evaluating the biases should be an essential step towards mitigating them in the systems. This paper introduces WinoST, a new freely available challenge set for evaluating gender bias in speech translation. WinoST is the speech version of WinoMT, an MT challenge set, and both follow an evaluation protocol to measure gender accuracy. Using an S-Transformer end-to-end speech translation system, we report the gender bias evaluation on four language pairs, and we reveal the inaccuracies in translations generating gender-stereotyped translations.

2021

pdf bib
Impact of COVID-19 in Natural Language Processing Publications: a Disaggregated Study in Gender, Contribution and Experience
Christine Basta | Marta R. Costa-jussa
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

This study sheds light on the effects of COVID-19 in the particular field of Computational Linguistics and Natural Language Processing within Artificial Intelligence. We provide an inter-sectional study on gender, contribution, and experience that considers one school year (from August 2019 to August 2020) as a pandemic year. August is included twice for the purpose of an inter-annual comparison. While the trend in publications increased with the crisis, the results show that the ratio between female and male publications decreased. This only helps to reduce the importance of the female role in the scientific contributions of computational linguistics (it is now far below its peak of 0.24). The pandemic has a particularly negative effect on the production of female senior researchers in the first position of authors (maximum work), followed by the female junior researchers in the last position of authors (supervision or collaborative work).

pdf bib
The TALP-UPC Participation in WMT21 News Translation Task: an mBART-based NMT Approach
Carlos Escolano | Ioannis Tsiamas | Christine Basta | Javier Ferrando | Marta R. Costa-jussa | José A. R. Fonollosa
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission to the WMT 2021 news translation shared task by the UPC Machine Translation group. The goal of the task is to translate German to French (De-Fr) and French to German (Fr-De). Our submission focuses on fine-tuning a pre-trained model to take advantage of monolingual data. We fine-tune mBART50 using the filtered data, and additionally, we train a Transformer model on the same data from scratch. In the experiments, we show that fine-tuning mBART50 results in 31.69 BLEU for De-Fr and 23.63 BLEU for Fr-De, which increases 2.71 and 1.90 BLEU accordingly, as compared to the model we train from scratch. Our final submission is an ensemble of these two models, further increasing 0.3 BLEU for Fr-De.

pdf bib
Multi-Task Learning for Improving Gender Accuracy in Neural Machine Translation
Carlos Escolano | Graciela Ojeda | Christine Basta | Marta R. Costa-jussa
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Machine Translation is highly impacted by social biases present in data sets, indicating that it reflects and amplifies stereotypes. In this work, we study mitigating gender bias by jointly learning the translation, the part-of-speech, and the gender of the target language with different morphological complexity. This approach has shown improvements up to 6.8 points in gender accuracy without significantly impacting the translation quality.

2020

bib
Towards Mitigating Gender Bias in a decoder-based Neural Machine Translation model by Adding Contextual Information
Christine Basta | Marta R. Costa-jussà | José A. R. Fonollosa
Proceedings of the Fourth Widening Natural Language Processing Workshop

Gender bias negatively impacts many natural language processing applications, including machine translation (MT). The motivation behind this work is to study whether recent proposed MT techniques are significantly contributing to attenuate biases in document-level and gender-balanced data. For the study, we consider approaches of adding the previous sentence and the speaker information, implemented in a decoder-based neural MT system. We show improvements both in translation quality (+1 BLEU point) as well as in gender bias mitigation on WinoMT (+5% accuracy).

2019

pdf bib
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Christine Basta | Marta R. Costa-jussà | Noe Casas
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased.

pdf bib
The TALP-UPC Machine Translation Systems for WMT19 News Translation Task: Pivoting Techniques for Low Resource MT
Noe Casas | José A. R. Fonollosa | Carlos Escolano | Christine Basta | Marta R. Costa-jussà
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this article, we describe the TALP-UPC research group participation in the WMT19 news translation shared task for Kazakh-English. Given the low amount of parallel training data, we resort to using Russian as pivot language, training subword-based statistical translation systems for Russian-Kazakh and Russian-English that were then used to create two synthetic pseudo-parallel corpora for Kazakh-English and English-Kazakh respectively. Finally, a self-attention model based on the decoder part of the Transformer architecture was trained on the two pseudo-parallel corpora.