Joan Plepi


2024

pdf bib
Do Multilingual Large Language Models Mitigate Stereotype Bias?
Shangrui Nie | Michael Fromm | Charles Welch | Rebekka Görge | Akbar Karimi | Joan Plepi | Nazia Mowmita | Nicolas Flores-Herr | Mehdi Ali | Lucie Flek
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP

While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.

pdf bib
Perspective Taking through Generating Responses to Conflict Situations
Joan Plepi | Charles Welch | Lucie Flek
Findings of the Association for Computational Linguistics: ACL 2024

Although language model performance across diverse tasks continues to improve, these models still struggle to understand and explain the beliefs of other people. This skill requires perspective-taking, the process of conceptualizing the point of view of another person. Perspective taking becomes challenging when the text reflects more personal and potentially more controversial beliefs.We explore this task through natural language generation of responses to conflict situations. We evaluate novel modifications to recent architectures for conditioning generation on an individual’s comments and self-disclosure statements. Our work extends the Social-Chem-101 corpus, using 95k judgements written by 6k authors from English Reddit data, for each of whom we obtained 20-500 self-disclosure statements. Our evaluation methodology borrows ideas from both personalized generation and theory of mind literature. Our proposed perspective-taking models outperform recent work, especially the twin encoder model conditioned on self-disclosures with high similarity to the conflict situation.

pdf bib
Corpus Considerations for Annotator Modeling and Scaling
Olufunke O. Sarumi | Béla Neuendorf | Joan Plepi | Lucie Flek | Jörg Schlötterer | Charles Welch
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios where annotation tasks are meant to encompass diversity, models that solely rely on the majority class labels may inadvertently disregard valuable minority perspectives. This oversight could result in the omission of crucial information and, in a broader context, risk disrupting the balance within larger ecosystems. As the landscape of annotator modeling unfolds with diverse representation techniques, it becomes imperative to investigate their effectiveness with the fine-grained features of the datasets in view. This study systematically explores various annotator modeling techniques and compares their performance across seven corpora. From our findings, we show that the commonly used user token model consistently outperforms more complex models. We introduce a composite embedding approach and show distinct differences in which model performs best as a function of the agreement with a given dataset. Our findings shed light on the relationship between corpus statistics and annotator modeling performance, which informs future work on corpus construction and perspectivist NLP.

2023

pdf bib
Personalized Intended and Perceived Sarcasm Detection on Twitter
Joan Plepi | Magdalena Buski | Lucie Flek
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

2022

pdf bib
Unifying Data Perspectivism and Personalization: An Application to Social Norms
Joan Plepi | Béla Neuendorf | Lucie Flek | Charles Welch
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Instead of using a single ground truth for language processing tasks, several recent studies have examined how to represent and predict the labels of the set of annotators. However, often little or no information about annotators is known, or the set of annotators is small. In this work, we examine a corpus of social media posts about conflict from a set of 13k annotators and 210k judgements of social norms. We provide a novel experimental setup that applies personalization methods to the modeling of annotators and compare their effectiveness for predicting the perception of social norms. We further provide an analysis of performance across subsets of social situations that vary by the closeness of the relationship between parties in conflict, and assess where personalization helps the most.

pdf bib
Understanding Interpersonal Conflict Types and their Impact on Perception Classification
Charles Welch | Joan Plepi | Béla Neuendorf | Lucie Flek
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Studies on interpersonal conflict have a long history and contain many suggestions for conflict typology. We use this as the basis of a novel annotation scheme and release a new dataset of situations and conflict aspect annotations. We then build a classifier to predict whether someone will perceive the actions of one individual as right or wrong in a given situation. Our analyses include conflict aspects, but also generated clusters, which are human validated, and show differences in conflict content based on the relationship of participants to the author. Our findings have important implications for understanding conflict and social norms.

pdf bib
FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
Flora Sakketou | Joan Plepi | Riccardo Cervero | Henri Jacques Geiss | Paolo Rosso | Lucie Flek
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4M Reddit posts, and includes, beyond the users’ binary labels, also their fine-grained credibility level (very low to very high) and their political bias strength (extreme right to extreme left). As far as we are aware, this is the first fake news spreader dataset that simultaneously captures both the long-term context of users’ historical posts and the interactions between them. To create the first benchmark on our data, we provide methods for identifying misinformation spreaders by utilizing the social connections between the users along with their psycho-linguistic features. We show that the users’ social interactions can, on their own, indicate misinformation spreading, while the psycho-linguistic features are mostly informative in non-neural classification settings. In a qualitative analysis we observe that detecting affective mental processes correlates negatively with right-biased users, and that the openness to experience factor is lower for those who spread fake news.

pdf bib
Temporal Graph Analysis of Misinformation Spreaders in Social Media
Joan Plepi | Flora Sakketou | Henri-Jacques Geiss | Lucie Flek
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. Although the news domain is subject to rapid changes over time, the temporal dynamics of the spreaders’ language and network have not been explored yet. In this paper, we analyze the users’ time-evolving semantic similarities and social interactions and show that such patterns can, on their own, indicate misinformation spreading. Building on these observations, we propose a dynamic graph-based framework that leverages the dynamic nature of the users’ network for detecting fake news spreaders. We validate our design choice through qualitative analysis and demonstrate the contributions of our model’s components through a series of exploratory and ablative experiments on two datasets.

2021

pdf bib
Perceived and Intended Sarcasm Detection with Graph Attention Networks
Joan Plepi | Lucie Flek
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and interpretation. In this work, we propose a framework jointly leveraging (1) a user context from their historical tweets together with (2) the social information from a user’s conversational neighborhood in an interaction graph, to contextualize the interpretation of the post. We use graph attention networks (GAT) over users and tweets in a conversation thread, combined with dense user history representations. Apart from achieving state-of-the-art results on the recently published dataset of 19k Twitter users with 30K labeled tweets, adding 10M unlabeled tweets as context, our results indicate that the model contributes to interpreting the sarcastic intentions of an author more than to predicting the sarcasm perception by others.

pdf bib
Conversational Question Answering over Knowledge Graphs with Transformer and Graph Attention Networks
Endri Kacupaj | Joan Plepi | Kuldeep Singh | Harsh Thakkar | Jens Lehmann | Maria Maleshkova
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper addresses the task of (complex) conversational question answering over a knowledge graph. For this task, we propose LASAGNE (muLti-task semAntic parSing with trAnsformer and Graph atteNtion nEworks). It is the first approach, which employs a transformer architecture extended with Graph Attention Networks for multi-task neural semantic parsing. LASAGNE uses a transformer model for generating the base logical forms, while the Graph Attention model is used to exploit correlations between (entity) types and predicates to produce node representations. LASAGNE also includes a novel entity recognition module which detects, links, and ranks all relevant entities in the question context. We evaluate LASAGNE on a standard dataset for complex sequential question answering, on which it outperforms existing baselines averaged on all question types. Specifically, we show that LASAGNE improves the F1-score on eight out of ten question types; in some cases, the increase is more than 20% compared to state of the art (SotA).

pdf bib
Perceived and Intended Sarcasm Detection with Graph Attention Networks
Joan Plepi | Lucie Flek
Findings of the Association for Computational Linguistics: EMNLP 2021

Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and interpretation. In this work, we propose a framework jointly leveraging (1) a user context from their historical tweets together with (2) the social information from a user’s neighborhood in an interaction graph, to contextualize the interpretation of the post. We distinguish between perceived and self-reported sarcasm identification. We use graph attention networks (GAT) over users and tweets in a conversation thread, combined with various dense user history representations. Apart from achieving state-of-the-art results on the recently published dataset of 19k Twitter users with 30K labeled tweets, adding 10M unlabeled tweets as context, our experiments indicate that the graph network contributes to interpreting the sarcastic intentions of the author more than to predicting the sarcasm perception by others.