Francielle Vargas

2026

Socially Responsible and Explainable Automated Fact-Checking and Hate Speech Detection
Francielle Vargas | Fabrício Benevenuto | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2

This Ph.D. dissertation advances the state-of-the-art in Natural Language Processing (NLP) for Portuguese by proposing new and innovative data resources and explainable methods for hate speech detection and automated fact-checking. The thesis introduces several benchmark datasets for Brazilian Portuguese, HateBR, HateBRXplain, HateBRMoralXplain, MFTCXplain, MOL, and FactNews, which have been widely adopted by the research community and address critical gaps in the availability of high-quality annotated resources for Portuguese. In addition, this dissertation proposes novel post-hoc and self-explaining NLP methods: Sentence-Level Factual Reasoning (SELFAR), Social Stereotype Analysis (SSA), Contextual Bag-of-Words with Interpretable Input and Feature Optimization (B+M), Supervised Rational Attention (SRA), and Supervised Moral Rational Attention (SMRA). Across multiple tasks and datasets in Portuguese, these methods outperform baselines while improving interpretability and robustness, demonstrating that explainability and performance can be jointly optimized. Finally, this thesis has achieved significant national and international impact, being cited by leading universities and research institutes worldwide and fostering new M.Sc. and Ph.D. research projects in Brazil. Its scientific and social contributions have also been recognized with multiple prestigious national and international awards, including the Google LARA, the Maria Carolina Monard Best Thesis Award in Artificial Intelligence, the Trevisan Prize for Students “AI for Good” from Bocconi University for rigorous computer science research in AI with social impact, and the Diversity and Inclusion Award from the Association for Computational Linguistics (ACL). Lastly, this thesis has received two nominations for the Brazilian Computer Society Thesis Awards in Computer Science, and in Multimedia, Hypermedia, and Web.

pdf bib abs

Existing hate speech detection models are often opaque and rely on surface-level lexical cues, which makes them vulnerable to spurious correlations and limits robustness, interpretability and cultural contextualization. We propose Supervised Moral Rationale Attention (SMRA), the first self-explaining hate speech detection framework to incorporate moral rationales as direct supervision for attention alignment. Based on Moral Foundations Theory, SMRA aligns token-level attention with expert-annotated moral rationales, guiding models to attend to morally salient spans. Unlike prior rationale-supervised or post-hoc approaches, SMRA integrates moral rationale supervision directly into the training objective, producing inherently interpretable and contextualized explanations. To support our framework, we also introduce HateBRMoralXplain, a Brazilian Portuguese benchmark dataset annotated with hate labels, moral categories, token-level moral rationales, and socio-political metadata. Across binary hate speech detection and multi-label moral sentiment classification, SMRA consistently improves performance while enhancing both faithful and plausible explanations. Although explanations become more concise, sufficiency decreases, indicating more compact and informative rationales. Fairness remains stable, suggesting that improvements in explanation quality do not introduce significant bias trade-offs.

2025

pdf bib

pdf bib abs

Ensuring the moral reasoning capabilities of Large Language Models (LLMs) is a growing concern as these systems are used in socially sensitive tasks. Nevertheless, current evaluation benchmarks present two major shortcomings: a lack of annotations that justify moral classifications, which limits transparency and interpretability; and a predominant focus on English, which constrains the assessment of moral reasoning across diverse cultural settings. In this paper, we introduce MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via multi-hop hate speech explanations using the Moral Foundations Theory. MFTCXplain comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales. Our results show a misalignment between LLM outputs and human annotations in moral reasoning tasks. While LLMs perform well in hate speech detection (F1 up to 0.836), their ability to predict moral sentiments is notably weak (F1 < 0.35). Furthermore, rationale alignment remains limited mainly in underrepresented languages. Our findings show the limited capacity of current LLMs to internalize and reflect human moral reasoning.

pdf bib abs

HateBRXplain: A Benchmark Dataset with Human-Annotated Rationales for Explainable Hate Speech Detection in Brazilian Portuguese
Isadora Salles | Francielle Vargas | Fabrício Benevenuto
Proceedings of the 31st International Conference on Computational Linguistics

Nowadays, hate speech technologies are surely relevant in Brazil. Nevertheless, the inability of these technologies to provide reasons (rationales) for their decisions is the limiting factor to their adoption since they comprise bias, which may perpetuate social inequalities when propagated at scale. This scenario highlights the urgency of proposing explainable technologies to address hate speech. However, explainable models heavily depend on data availability with human-annotated rationales, which are scarce, especially for low-resource languages. To fill this relevant gap, we introduce HateBRXplain, the first benchmark dataset for hate speech detection in Portuguese, with text span annotations capturing rationales. We evaluated our corpus using mBERT, BERTimbau, DistilBERTimbau, and PTT5 models, which outperformed the current baselines. We further assessed these models’ explainability using model-agnostic explanation methods (LIME and SHAP). Results demonstrate plausible post-hoc explanations when compared to human annotations. However, the best-performing hate speech detection models failed to provide faithful rationales.

2024

pdf bib abs

We introduce the first expert annotated corpus of Facebook comments for Hausa hate speech detection. The corpus titled HausaHate comprises 2,000 comments extracted from Western African Facebook pages and manually annotated by three Hausa native speakers, who are also NLP experts. Our corpus was annotated using two different layers. We first labeled each comment according to a binary classification: offensive versus non-offensive. Then, offensive comments were also labeled according to hate speech targets: race, gender and none. Lastly, a baseline model using fine-tuned LLM for Hausa hate speech detection is presented, highlighting the challenges of hate speech detection tasks for indigenous languages in Africa, as well as future advances.

pdf bib abs

Improving Explainable Fact-Checking via Sentence-Level Factual Reasoning
Francielle Vargas | Isadora Salles | Diego Alves | Ameeta Agrawal | Thiago A. S. Pardo | Fabrício Benevenuto
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)

Most existing fact-checking systems are unable to explain their decisions by providing relevant rationales (justifications) for their predictions. It highlights a lack of transparency that poses significant risks, such as the prevalence of unexpected biases, which may increase political polarization due to limitations in impartiality. To address this critical gap, we introduce SEntence-Level FActual Reasoning (SELFAR), aimed at improving explainable fact-checking. SELFAR relies on fact extraction and verification by predicting the news source reliability and factuality (veracity) of news articles or claims at the sentence level, generating post-hoc explanations using SHAP/LIME and zero-shot prompts. Our experiments show that unreliable news stories predominantly consist of subjective statements, in contrast to reliable ones. Consequently, predicting unreliable news articles at the sentence level by analyzing impartiality and subjectivity is a promising approach for fact extraction and improving explainable fact-checking. Furthermore, LIME outperforms SHAP in explaining predictions on reliability. Additionally, while zero-shot prompts provide highly readable explanations and achieve an accuracy of 0.71 in predicting factuality, their tendency to hallucinate remains a challenge. Lastly, this paper also presents the first study on explainable fact-checking in the Portuguese language.

pdf bib abs

Extended Multimodal Hate Speech Event Detection During Russia-Ukraine Crisis - Shared Task at CASE 2024
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Jafri | Hariram Veeramani | Raghav Jain | Sandesh Jain | Francielle Vargas | Ali Hürriyetoğlu | Usman Naseem
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Addressing the need for effective hate speech moderation in contemporary digital discourse, the Multimodal Hate Speech Event Detection Shared Task made its debut at CASE 2023, co-located with RANLP 2023. Building upon its success, an extended version of the shared task was organized at the CASE workshop in EACL 2024. Similar to the earlier iteration, in this shared task, participants address hate speech detection through two subtasks. Subtask A is a binary classification problem, assessing whether text-embedded images contain hate speech. Subtask B goes further, demanding the identification of hate speech targets, such as individuals, communities, and organizations within text-embedded images. Performance is evaluated using the macro F1-score metric in both subtasks. With a total of 73 registered participants, the shared task witnessed remarkable achievements, with the best F1-scores in Subtask A and Subtask B reaching 87.27% and 80.05%, respectively, surpassing the leaderboard of the previous CASE 2023 shared task. This paper provides a comprehensive overview of the performance of seven teams that submitted results for Subtask A and five teams for Subtask B.

2023

pdf bib abs

Predicting Sentence-Level Factuality of News and Bias of Media Outlets
Francielle Vargas | Kokil Jaidka | Thiago Pardo | Fabrício Benevenuto
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Automated news credibility and fact-checking at scale require accurate prediction of news factuality and media bias. This paper introduces a large sentence-level dataset, titled “FactNews”, composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles showed promising results for predicting the reliability of entire media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.

pdf bib abs

Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?
Francielle Vargas | Isabelle Carvalho | Ali Hürriyetoğlu | Thiago Pardo | Fabrício Benevenuto
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Recent studies have shown that hate speech technologies may propagate social stereotypes against marginalized groups. Nevertheless, there has been a lack of realistic approaches to assess and mitigate biased technologies. In this paper, we introduce a new approach to analyze the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs by analyzing the distinctive classification of tuples containing stereotypes versus counter-stereotypes in machine learning models and datasets. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social identity groups by reflecting and reinforcing stereotypical beliefs regarding minorities. Furthermore, we also found that models that embed expert and context information from offensiveness markers present promising results to mitigate social stereotype bias towards socially responsible hate speech detection.

pdf bib abs

NoHateBrazil: A Brazilian Portuguese Text Offensiveness Analysis System
Francielle Vargas | Isabelle Carvalho | Wolfgang Schmeisser-Nieto | Fabrício Benevenuto | Thiago Pardo
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Hate speech is a surely relevant problem in Brazil. Nevertheless, its regulation is not effective due to the difficulty to identify, quantify and classify offensive comments. Here, we introduce a novel system for offensive comment analysis in Brazilian Portuguese. The system titled “NoHateBrazil” recognizes explicit and implicit offensiveness in context at a fine-grained level. Specifically, we propose a framework for data collection, human annotation and machine learning models that were used to build the system. In addition, we assess the potential of our system to reflect stereotypical beliefs against marginalized groups by contrasting them with counter-stereotypes. As a result, a friendly web application was implemented, which besides presenting relevant performance, showed promising results towards mitigation of the risk of reinforcing social stereotypes. Lastly, new measures were proposed to improve the explainability of offensiveness classification and reliability of the model’s predictions.

pdf bib abs

Multimodal Hate Speech Event Detection - Shared Task 4, CASE 2023
Surendrabikram Thapa | Farhan Jafri | Ali Hürriyetoğlu | Francielle Vargas | Roy Ka-Wei Lee | Usman Naseem
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

Ensuring the moderation of hate speech and its targets emerges as a critical imperative within contemporary digital discourse. To facilitate this imperative, the shared task Multimodal Hate Speech Event Detection was organized in the sixth CASE workshop co-located at RANLP 2023. The shared task has two subtasks. The sub-task A required participants to pose hate speech detection as a binary problem i.e. they had to detect if the given text-embedded image had hate or not. Similarly, sub-task B required participants to identify the targets of the hate speech namely individual, community, and organization targets in text-embedded images. For both sub-tasks, the participants were ranked on the basis of the F1-score. The best F1-score in sub-task A and sub-task B were 85.65 and 76.34 respectively. This paper provides a comprehensive overview of the performance of 13 teams that submitted the results in Subtask A and 10 teams in Subtask B.

2022

pdf bib abs

HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
Francielle Vargas | Isabelle Carvalho | Fabiana Rodrigues de Góes | Thiago Pardo | Fabrício Benevenuto
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Due to the severity of the social media offensive and hateful comments in Brazil, and the lack of research in Portuguese, this paper provides the first large-scale expert annotated corpus of Brazilian Instagram comments for hate speech and offensive language detection. The HateBR corpus was collected from the comment section of Brazilian politicians’ accounts on Instagram and manually annotated by specialists, reaching a high inter-annotator agreement. The corpus consists of 7,000 documents annotated according to three different layers: a binary classification (offensive versus non-offensive comments), offensiveness-level classification (highly, moderately, and slightly offensive), and nine hate speech groups (xenophobia, racism, homophobia, sexism, religious intolerance, partyism, apology for the dictatorship, antisemitism, and fatphobia). We also implemented baseline experiments for offensive language and hate speech detection and compared them with a literature baseline. Results show that the baseline experiments on our corpus outperform the current state-of-the-art for the Portuguese language.

pdf bib abs

Rhetorical Structure Approach for Online Deception Detection: A Survey
Francielle Vargas | Jonas D‘Alessandro | Zohar Rabinovich | Fabrício Benevenuto | Thiago Pardo
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Most information is passed on in the form of language. Therefore, research on how people use language to inform and misinform, and how this knowledge may be automatically extracted from large amounts of text is surely relevant. This survey provides first-hand experiences and a comprehensive review of rhetorical-level structure analysis for online deception detection. We systematically analyze how discourse structure, aligned or not with other approaches, is applied to automatic fake news and fake reviews detection on the web and social media. Moreover, we categorize discourse-tagged corpora along with results, hence offering a summary and accessible introductions to new researchers.

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese & Subtask 4 English.

2021

pdf bib abs

Toward Discourse-Aware Models for Multilingual Fake News Detection
Francielle Vargas | Fabrício Benevenuto | Thiago Pardo
Proceedings of the Student Research Workshop Associated with RANLP 2021

Statements that are intentionally misstated (or manipulated) are of considerable interest to researchers, government, security, and financial systems. According to deception literature, there are reliable cues for detecting deception and the belief that liars give off cues that may indicate their deception is near-universal. Therefore, given that deceiving actions require advanced cognitive development that honesty simply does not require, as well as people’s cognitive mechanisms have promising guidance for deception detection, in this Ph.D. ongoing research, we propose to examine discourse structure patterns in multilingual deceptive news corpora using the Rhetorical Structure Theory framework. Considering that our work is the first to exploit multilingual discourse-aware strategies for fake news detection, the research community currently lacks multilingual deceptive annotated corpora. Accordingly, this paper describes the current progress in this thesis, including (i) the construction of the first multilingual deceptive corpus, which was annotated by specialists according to the Rhetorical Structure Theory framework, and (ii) the introduction of two new proposed rhetorical relations: INTERJECTION and IMPERATIVE, which we assume to be relevant for the fake news detection task.

pdf bib abs

Contextual-Lexicon Approach for Abusive Language Detection
Francielle Vargas | Fabiana Rodrigues de Góes | Isabelle Carvalho | Fabrício Benevenuto | Thiago Pardo
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Since a lexicon-based approach is more elegant scientifically, explaining the solution components and being easier to generalize to other applications, this paper provides a new approach for offensive language and hate speech detection on social media, which embodies a lexicon of implicit and explicit offensive and swearing expressions annotated with contextual information. Due to the severity of the social media abusive comments in Brazil, and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate the models. Nevertheless, our method may be applied to any other language. The conducted experiments show the effectiveness of the proposed approach, outperforming the current baseline methods for the Portuguese language.

Co-authors

Venues