Amanda Cercas Curry

Also published as: Amanda Cercas Curry


2024

pdf bib
Classist Tools: Social Class Correlates with Performance in NLP
Amanda Cercas Curry | Giuseppe Attanasio | Zeerak Talat | Dirk Hovy
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The field of sociolinguistics has studied factors affecting language use for the last century. Labov (1964) and Bernstein (1960) showed that socioeconomic class strongly influences our accents, syntax and lexicon. However, despite growing concerns surrounding fairness and bias in Natural Language Processing (NLP), there is a dearth of studies delving into the effects it may have on NLP systems. We show empirically that NLP systems’ performance is affected by speakers’ SES, potentially disadvantaging less-privileged socioeconomic groups. We annotate a corpus of 95K utterances from movies with social class, ethnicity and geographical language variety and measure the performance of NLP systems on three tasks: language modelling, automatic speech recognition, and grammar error correction. We find significant performance disparities that can be attributed to socioeconomic status as well as ethnicity and geographical differences. With NLP technologies becoming ever more ubiquitous and quotidian, they must accommodate all language varieties to avoid disadvantaging already marginalised groups. We argue for the inclusion of socioeconomic class in future language technologies.

pdf bib
Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models
Flor Miriam Plaza-del-Arco | Amanda Cercas Curry | Susanna Paoli | Alba Cercas Curry | Dirk Hovy
Findings of the Association for Computational Linguistics: EMNLP 2024

Emotions play important epistemological and cognitive roles in our lives, revealing our values and guiding our actions. Previous work has shown that LLMs display biases in emotion attribution along gender lines. However, unlike gender, which says little about our values, religion, as a socio-cultural system, prescribes a set of beliefs and values for its followers. Religions, therefore, cultivate certain emotions. Moreover, these rules are explicitly laid out and interpreted by religious leaders. Using emotion attribution, we explore how different religions are represented in LLMs. We find that:Major religions in the US and European countries are represented with more nuance, displaying a more shaded model of their beliefs.Eastern religions like Hinduism and Buddhism are strongly stereotyped.Judaism and Islam are stigmatized – the models’ refusal skyrocket. We ascribe these to cultural bias in LLMs and the scarcity of NLP literature on religion. In the rare instances where religion is discussed, it is often in the context of toxic language, perpetuating the perception of these religions as inherently toxic. This finding underscores the urgent need to address and rectify these biases. Our research emphasizes the crucial role emotions play in shaping our lives and how our values influence them.

pdf bib
Proceedings of the Third Workshop on Bridging Human--Computer Interaction and Natural Language Processing
Su Lin Blodgett | Amanda Cercas Curry | Sunipa Dev | Michael Madaio | Ani Nenkova | Diyi Yang | Ziang Xiao
Proceedings of the Third Workshop on Bridging Human--Computer Interaction and Natural Language Processing

pdf bib
Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions
Flor Miriam Plaza-del-Arco | Alba A. Cercas Curry | Amanda Cercas Curry | Dirk Hovy
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Emotions are a central aspect of communication. Consequently, emotion analysis (EA) is a rapidly growing field in natural language processing (NLP). However, there is no consensus on scope, direction, or methods. In this paper, we conduct a thorough review of 154 relevant NLP publications from the last decade. Based on this review, we address four different questions: (1) How are EA tasks defined in NLP? (2) What are the most prominent emotion frameworks and which emotions are modeled? (3) Is the subjectivity of emotions considered in terms of demographics and cultural factors? and (4) What are the primary NLP applications for EA? We take stock of trends in EA and tasks, emotion frameworks used, existing datasets, methods, and applications. We then discuss four lacunae: (1) the absence of demographic and cultural aspects does not account for the variation in how emotions are perceived, but instead assumes they are universally experienced in the same manner; (2) the poor fit of emotion categories from the two main emotion theories to the task; (3) the lack of standardized EA terminology hinders gap identification, comparison, and future goals; and (4) the absence of interdisciplinary research isolates EA from insights in other fields. Our work will enable more focused research into EA and a more holistic approach to modeling emotions in NLP.

pdf bib
Impoverished Language Technology: The Lack of (Social) Class in NLP
Amanda Cercas Curry | Zeerak Talat | Dirk Hovy
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Since Labov’s foundational 1964 work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception. Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology. While age and gender are well covered, Labov’s initial target, socio-economic class, is largely absent. We survey the existing Natural Language Processing (NLP) literature and find that only 20 papers even mention socio-economic status. However, the majority of those papers do not engage with class beyond collecting information of annotator-demographics. Given this research lacuna, we provide a definition of class that can be operationalised by NLP researchers, and argue for including socio-economic class in future language technologies.

pdf bib
Proceedings of Safety4ConvAI: The Third Workshop on Safety for Conversational AI @ LREC-COLING 2024
Tanvi Dinkar | Giuseppe Attanasio | Amanda Cercas Curry | Ioannis Konstas | Dirk Hovy | Verena Rieser
Proceedings of Safety4ConvAI: The Third Workshop on Safety for Conversational AI @ LREC-COLING 2024

pdf bib
Subjective Isms? On the Danger of Conflating Hate and Offence in Abusive Language Detection
Amanda Cercas Curry | Gavin Abercrombie | Zeerak Talat
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

Natural language processing research has begun to embrace the notion of annotator subjectivity, motivated by variations in labelling. This approach understands each annotator’s view as valid, which can be highly suitable for tasks that embed subjectivity, e.g., sentiment analysis. However, this construction may be inappropriate for tasks such as hate speech detection, as it affords equal validity to all positions on e.g., sexism or racism. We argue that the conflation of hate and offence can invalidate findings on hate speech, and call for future work to be situated in theory, disentangling hate from its orthogonal concept, offence.

2023

pdf bib
Mirages. On Anthropomorphism in Dialogue Systems
Gavin Abercrombie | Amanda Cercas Curry | Tanvi Dinkar | Verena Rieser | Zeerak Talat
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Automated dialogue or conversational systems are anthropomorphised by developers and personified by users. While a degree of anthropomorphism is inevitable, conscious and unconscious design choices can guide users to personify them to varying degrees. Encouraging users to relate to automated systems as if they were human can lead to transparency and trust issues, and high risk scenarios caused by over-reliance on their outputs. As a result, natural language processing researchers have investigated the factors that induce personification and develop resources to mitigate such effects. However, these efforts are fragmented, and many aspects of anthropomorphism have yet to be explored. In this paper, we discuss the linguistic factors that contribute to the anthropomorphism of dialogue systems and the harms that can arise thereof, including reinforcing gender stereotypes and conceptions of acceptable language. We recommend that future efforts towards developing dialogue systems take particular care in their design, development, release, and description; and attend to the many linguistic cues that can elicit personification by users.

pdf bib
Computer says “No”: The Case Against Empathetic Conversational AI
Alba Curry | Amanda Cercas Curry
Findings of the Association for Computational Linguistics: ACL 2023

Emotions are an integral part of human cognition and they guide not only our understanding of the world but also our actions within it. As such, whether we soothe or flame an emotion is not inconsequential. Recent work in conversational AI has focused on responding empathetically to users, validating and soothing their emotions without a real basis. This AI-aided emotional regulation can have negative consequences for users and society, tending towards a one-noted happiness defined as only the absence of “negative” emotions. We argue that we must carefully consider whether and how to respond to users’ emotions.

pdf bib
MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection
Amanda Cercas Curry | Giuseppe Attanasio | Debora Nozza | Dirk Hovy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard fine-tuning. Our results show that the ensemble is more robust than individual models and that regularized models generate more “conservative” predictions, mitigating the effects of lexical overfitting.However, our error analysis also finds that many of the misclassified instances are debatable, raising questions about the objective annotatability of hate speech data.

2021

pdf bib
ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI
Amanda Cercas Curry | Gavin Abercrombie | Verena Rieser
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present the first English corpus study on abusive language towards three conversational AI systems gathered ‘in the wild’: an open-domain social bot, a rule-based chatbot, and a task-based system. To account for the complexity of the task, we take a more ‘nuanced’ approach where our ConvAI dataset reflects fine-grained notions of abuse, as well as views from multiple expert annotators. We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems. Finally, we report results from bench-marking existing models against this data. Unsurprisingly, we find that there is substantial room for improvement with F1 scores below 90%.

pdf bib
Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants
Gavin Abercrombie | Amanda Cercas Curry | Mugdha Pandya | Verena Rieser
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

Technology companies have produced varied responses to concerns about the effects of the design of their conversational AI systems. Some have claimed that their voice assistants are in fact not gendered or human-like—despite design features suggesting the contrary. We compare these claims to user perceptions by analysing the pronouns they use when referring to AI assistants. We also examine systems’ responses and the extent to which they generate output which is gendered and anthropomorphic. We find that, while some companies appear to be addressing the ethical concerns raised, in some cases, their claims do not seem to hold true. In particular, our results show that system outputs are ambiguous as to the humanness of the systems, and that users tend to personify and gender them as a result.

2020

pdf bib
Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas
Amanda Cercas Curry | Judy Robertson | Verena Rieser
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

Conversational voice assistants are rapidly developing from purely transactional systems to social companions with “personality”. UNESCO recently stated that the female and submissive personality of current digital assistants gives rise for concern as it reinforces gender stereotypes. In this work, we present results from a participatory design workshop, where we invite people to submit their preferences for a what their ideal persona might look like, both in drawings as well as in a multiple choice questionnaire. We find no clear consensus which suggests that one possible solution is to let people configure/personalise their assistants. We then outline a multi-disciplinary project of how we plan to address the complex question of gender and stereotyping in digital assistants.

2019

pdf bib
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
Amanda Cercas Curry | Verena Rieser
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

How should conversational agents respond to verbal abuse through the user? To answer this question, we conduct a large-scale crowd-sourced evaluation of abuse response strategies employed by current state-of-the-art systems. Our results show that some strategies, such as “polite refusal”, score highly across the board, while for other strategies demographic factors, such as age, as well as the severity of the preceding abuse influence the user’s perception of which response is appropriate. In addition, we find that most data-driven models lag behind rule-based or commercial systems in terms of their perceived appropriateness.

2018

pdf bib
#MeToo Alexa: How Conversational Systems Respond to Sexual Harassment
Amanda Cercas Curry | Verena Rieser
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing

Conversational AI systems, such as Amazon’s Alexa, are rapidly developing from purely transactional systems to social chatbots, which can respond to a wide variety of user requests. In this article, we establish how current state-of-the-art conversational systems react to inappropriate requests, such as bullying and sexual harassment on the part of the user, by collecting and analysing the novel #MeTooAlexa corpus. Our results show that commercial systems mainly avoid answering, while rule-based chatbots show a variety of behaviours and often deflect. Data-driven systems, on the other hand, are often non-coherent, but also run the risk of being interpreted as flirtatious and sometimes react with counter-aggression. This includes our own system, trained on “clean” data, which suggests that inappropriate system behaviour is not caused by data bias.

2017

pdf bib
Why We Need New Evaluation Metrics for NLG
Jekaterina Novikova | Ondřej Dušek | Amanda Cercas Curry | Verena Rieser
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG. We also show that metric performance is data- and system-specific. Nevertheless, our results also suggest that automatic metrics perform reliably at system-level and can support system development by finding cases where a system performs poorly.

2015

pdf bib
Generating and Evaluating Landmark-Based Navigation Instructions in Virtual Environments
Amanda Cercas Curry | Dimitra Gkatzia | Verena Rieser
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

pdf bib
A Game-Based Setup for Data Collection and Task-Based Evaluation of Uncertain Information Presentation
Dimitra Gkatzia | Amanda Cercas Curry | Verena Rieser | Oliver Lemon
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)