Björn Ross

Also published as: Bjorn Ross


2024

pdf bib
Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
Agostina Calabrese | Leonardo Neves | Neil Shah | Maarten Bos | Björn Ross | Mirella Lapata | Francesco Barbieri
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators’ decision making time by 7.4%.

2023

pdf bib
Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models
Eddie Ungless | Bjorn Ross | Anne Lauscher
Findings of the Association for Computational Linguistics: ACL 2023

Cutting-edge image generation has been praised for producing high-quality images, suggesting a ubiquitous future in a variety of applications. However, initial studies have pointed to the potential for harm due to predictive bias, reflecting and potentially reinforcing cultural stereotypes. In this work, we are the first to investigate how multimodal models handle diverse gender identities. Concretely, we conduct a thorough analysis in which we compare the output of three image generation models for prompts containing cisgender vs. non-cisgender identity terms. Our findings demonstrate that certain non-cisgender identities are consistently (mis)represented as less human, more stereotyped and more sexualised. We complement our experimental analysis with (a) a survey among non-cisgender individuals and (b) a series of interviews, to establish which harms affected individuals anticipate, and how they would like to be represented. We find respondents are particularly concerned about misrepresentation, and the potential to drive harmful behaviours and beliefs. Simple heuristics to limit offensive content are widely rejected, and instead respondents call for community involvement, curated training data and the ability to customise. These improvements could pave the way for a future where change is led by the affected community, and technology is used to positively ”[portray] queerness in ways that we haven’t even thought of”’ rather than reproducing stale, offensive stereotypes.

pdf bib
Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis
Seraphina Goldfarb-Tarrant | Björn Ross | Adam Lopez
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Sentiment analysis (SA) systems are widely deployed in many of the world’s languages, and there is well-documented evidence of demographic bias in these systems. In languages beyond English, scarcer training data is often supplemented with transfer learning using pre-trained models, including multilingual models trained on other languages. In some cases, even supervision data comes from other languages. Does cross-lingual transfer also import new biases? To answer this question, we use counterfactual evaluation to test whether gender or racial biases are imported when using cross-lingual transfer, compared to a monolingual transfer setting. Across five languages, we find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts. We also find racial biases to be much more prevalent than gender biases. To spur further research on this topic, we release the sentiment models we used for this study, and the intermediate checkpoints throughout training, yielding 1,525 distinct models; we also release our evaluation code.

pdf bib
What Makes Good Counterspeech? A Comparison of Generation Approaches and Evaluation Metrics
Yi Zheng | Björn Ross | Walid Magdy
Proceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA)

Counterspeech has been proposed as a solution to the proliferation of online hate. Research has shown that natural language processing (NLP) approaches could generate such counterspeech automatically, but there are competing ideas for how NLP models might be used for this task and a variety of evaluation metrics whose relationship to one another is unclear. We test three different approaches and collect ratings of the generated counterspeech for 1,740 tweet-participant pairs to systematically compare the counterspeech on three aspects: quality, effectiveness and user preferences. We examine which model performs best at which metric and which aspects of counterspeech predict user preferences. A free-form text generation approach using ChatGPT performs the most consistently well, though its generations are occasionally unspecific and repetitive. In our experiment, participants’ preferences for counterspeech are predicted by the quality of the counterspeech, not its perceived effectiveness. The results can help future research approach counterspeech evaluation more systematically.

pdf bib
Rumour Detection in the Wild: A Browser Extension for Twitter
Andrej Jovanovic | Björn Ross
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Rumour detection, particularly on social media, has gained popularity in recent years. The machine learning community has made significant contributions in investigating automatic methods to detect rumours on such platforms. However, these state-of-the-art (SoTA) models are often deployed by social media companies; ordinary end-users cannot leverage the solutions in the literature for their own rumour detection. To address this issue, we put forward a novel browser extension that allows these users to perform rumour detection on Twitter. Particularly, we leverage the performance from SoTA architectures, which has not been done previously. Initial results from a user study confirm that this browser extension provides benefit. Additionally, we examine the performance of our browser extension’s rumour detection model in a simulated deployment environment. Our results show that additional infrastructure for the browser extension is required to ensure its usability when deployed as a live service for Twitter users at scale.

pdf bib
Temporal Generalizability in Multimodal Misinformation Detection
Nataliya Stepanova | Björn Ross
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

Misinformation detection models degrade in performance over time, but the precise causes of this remain under-researched, in particular for multimodal models. We present experiments investigating the impact of temporal shift on performance of multimodal automatic misinformation detection classifiers. Working with the r/Fakeddit dataset, we found that evaluating models on temporally out-of-domain data (i.e. data from time stretches unseen in training) results in a non-linear, 7-8% drop in macro F1 as compared to traditional evaluation strategies (which do not control for the effect of content change over time). Focusing on two factors that make temporal generalizability in misinformation detection difficult, content shift and class distribution shift, we found that content shift has a stronger effect on recall. Within the context of coarse-grained vs. fine-grained misinformation detection with r/Fakeddit, we find that certain misinformation classes seem to be more stable with respect to content shift (e.g. Manipulated and Misleading Content). Our results indicate that future research efforts need to explicitly account for the temporal nature of misinformation to ensure that experiments reflect expected real-world performance.

2022

pdf bib
A Robust Bias Mitigation Procedure Based on the Stereotype Content Model
Eddie Ungless | Amy Rafferty | Hrichika Nag | Björn Ross
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

The Stereotype Content model (SCM) states that we tend to perceive minority groups as cold, incompetent or both. In this paper we adapt existing work to demonstrate that the Stereotype Content model holds for contextualised word embeddings, then use these results to evaluate a fine-tuning process designed to drive a language model away from stereotyped portrayals of minority groups. We find the SCM terms are better able to capture bias than demographic agnostic terms related to pleasantness. Further, we were able to reduce the presence of stereotypes in the model through a simple fine-tuning procedure that required minimal human and computer resources, without harming downstream performance. We present this work as a prototype of a debiasing procedure that aims to remove the need for a priori knowledge of the specifics of bias in the model.

pdf bib
KEViN: A Knowledge Enhanced Validity and Novelty Classifier for Arguments
Ameer Saadat-Yazdi | Xue Li | Sandrine Chausson | Vaishak Belle | Björn Ross | Jeff Z. Pan | Nadin Kökciyan
Proceedings of the 9th Workshop on Argument Mining

The ArgMining 2022 Shared Task is concerned with predicting the validity and novelty of an inference for a given premise and conclusion pair. We propose two feed-forward network based models (KEViN1 and KEViN2), which combine features generated from several pretrained transformers and the WikiData knowledge graph. The transformers are used to predict entailment and semantic similarity, while WikiData is used to provide a semantic measure between concepts in the premise-conclusion pair. Our proposed models show significant improvement over RoBERTa, with KEViN1 outperforming KEViN2 and obtaining second rank on both subtasks (A and B) of the ArgMining 2022 Shared Task.

pdf bib
Explainable Abuse Detection as Intent Classification and Slot Filling
Agostina Calabrese | Björn Ross | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 10

To proactively offer social media users a safe online experience, there is a need for systems that can detect harmful posts and promptly alert platform moderators. In order to guarantee the enforcement of a consistent policy, moderators are provided with detailed guidelines. In contrast, most state-of-the-art models learn what abuse is from labeled examples and as a result base their predictions on spurious cues, such as the presence of group identifiers, which can be unreliable. In this work we introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone. We propose a machine-friendly representation of the policy that moderators wish to enforce, by breaking it down into a collection of intents and slots. We collect and annotate a dataset of 3,535 English posts with such slots, and show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.1

2017

pdf bib
GraWiTas: a Grammar-based Wikipedia Talk Page Parser
Benjamin Cabrera | Laura Steinert | Björn Ross
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article’s talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax – resulting in the loss of content – and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar-based approach – offering a transparent implementation and easy extensibility.