Giuseppe Attanasio


pdf bib
MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection
Amanda Cercas Curry | Giuseppe Attanasio | Debora Nozza | Dirk Hovy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard fine-tuning. Our results show that the ensemble is more robust than individual models and that regularized models generate more “conservative” predictions, mitigating the effects of lexical overfitting.However, our error analysis also finds that many of the misclassified instances are debatable, raising questions about the objective annotatability of hate speech data.

pdf bib
ferret: a Framework for Benchmarking Explainers on Transformers
Giuseppe Attanasio | Eliana Pastor | Chiara Di Bonaventura | Debora Nozza
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them. We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers. With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any free-text or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library. In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models.


pdf bib
MilaNLP at SemEval-2022 Task 5: Using Perceiver IO for Detecting Misogynous Memes with Text and Image Modalities
Giuseppe Attanasio | Debora Nozza | Federico Bianchi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we describe the system proposed by the MilaNLP team for the Multimedia Automatic Misogyny Identification (MAMI) challenge. We use Perceiver IO as a multimodal late fusion over unimodal streams to address both sub-tasks A and B. We build unimodal embeddings using Vision Transformer (image) and RoBERTa (text transcript). We enrich the input representation using face and demographic recognition, image captioning, and detection of adult content and web entities. To the best of our knowledge, this work is the first to use Perceiver IO combining text and image modalities. The proposed approach outperforms unimodal and multimodal baselines.

pdf bib
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection
Giuseppe Attanasio | Debora Nozza | Eliana Pastor | Dirk Hovy
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use of these techniques for such a critical task comes with negative consequences. Various works have demonstrated that hate speech classifiers are biased. These findings have prompted efforts to explain classifiers, mainly using attribution methods. In this paper, we provide the first benchmark study of interpretability approaches for hate speech detection. We cover four post-hoc token attribution approaches to explain the predictions of Transformer-based misogyny classifiers in English and Italian. Further, we compare generated attributions to attention analysis. We find that only two algorithms provide faithful explanations aligned with human expectations. Gradient-based methods and attention, however, show inconsistent outputs, making their value for explanations questionable for hate speech detection tasks.

pdf bib
HATE-ITA: Hate Speech Detection in Italian Social Media Text
Debora Nozza | Federico Bianchi | Giuseppe Attanasio
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing supplies appropriate algorithms for trying to reach this objective, all research efforts are directed toward the English language. This strongly limits the classification power on non-English languages. In this paper, we test several learning frameworks for identifying hate speech in Italian text. We release HATE-ITA, a multi-language model trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono-lingual models and seems to adapt well also on language-specific slurs. We hope our findings will encourage the research in other mid-to-low resource communities and provide a valuable benchmarking tool for the Italian community.

pdf bib
Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists
Giuseppe Attanasio | Debora Nozza | Dirk Hovy | Elena Baralis
Findings of the Association for Computational Linguistics: ACL 2022

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms like gay, or women, resulting in false positives, severe unintended bias, and lower performance. Most mitigation techniques use lists of identity terms or samples from the target domain during training. However, this approach requires a-priori knowledge and introduces further bias if important terms are neglected. Instead, we propose a knowledge-free Entropy-based Attention Regularization (EAR) to discourage overfitting to training-specific terms. An additional objective function penalizes tokens with low self-attention entropy. We fine-tune BERT via EAR: the resulting model matches or exceeds state-of-the-art performance for hate speech classification and bias metrics on three benchmark corpora in English and Italian.EAR also reveals overfitting terms, i.e., terms most likely to induce bias, to help identify their effect on the model, task, and predictions.