2024
pdf
bib
abs
Simpler Becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?
Miriam Anschütz
|
Edoardo Mosca
|
Georg Groh
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Text simplification seeks to improve readability while retaining the original content and meaning. Our study investigates whether pre-trained classifiers also maintain such coherence by comparing their predictions on both original and simplified inputs. We conduct experiments using 11 pre-trained models, including BERT and OpenAI’s GPT 3.5, across six datasets spanning three languages. Additionally, we conduct a detailed analysis of the correlation between prediction change rates and simplification types/strengths. Our findings reveal alarming inconsistencies across all languages and models. If not promptly addressed, simplified inputs can be easily exploited to craft zero-iteration model-agnostic adversarial attacks with success rates of up to 50%.
2023
pdf
bib
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Edoardo Mosca
|
Daryna Dementieva
|
Tohid Ebrahim Ajdari
|
Maximilian Kummeth
|
Kirill Gringauz
|
Yutong Zhou
|
Georg Groh
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations
pdf
bib
abs
Distinguishing Fact from Fiction: A Benchmark Dataset for Identifying Machine-Generated Scientific Papers in the LLM Era.
Edoardo Mosca
|
Mohamed Hesham Ibrahim Abdalla
|
Paolo Basso
|
Margherita Musumeci
|
Georg Groh
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
As generative NLP can now produce content nearly indistinguishable from human writing, it becomes difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in NLP-generated text can potentially be factually wrong or even entirely fabricated. This study introduces a novel benchmark dataset, containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica. After describing the generation and extraction pipelines, we also experiment with four distinct classifiers as a baseline for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of detectors. We believe our work serves as an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature.
pdf
bib
“Dr LLM, what do I have?”: The Impact of User Beliefs and Prompt Formulation on Health Diagnoses
Wojciech Kusa
|
Edoardo Mosca
|
Aldo Lipani
Proceedings of the Third Workshop on NLP for Medical Conversations
2022
pdf
bib
abs
Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations
Lukas Huber
|
Marc Alexander Kühn
|
Edoardo Mosca
|
Georg Groh
Proceedings of the 7th Workshop on Representation Learning for NLP
State-of-the-art machine learning models are prone to adversarial attacks”:” Maliciously crafted inputs to fool the model into making a wrong prediction, often with high confidence. While defense strategies have been extensively explored in the computer vision domain, research in natural language processing still lacks techniques to make models resilient to adversarial text inputs. We adapt a technique from computer vision to detect word-level attacks targeting text classifiers. This method relies on training an adversarial detector leveraging Shapley additive explanations and outperforms the current state-of-the-art on two benchmarks. Furthermore, we prove the detector requires only a low amount of training samples and, in some cases, generalizes to different datasets without needing to retrain.
pdf
bib
abs
“That Is a Suspicious Reaction!”: Interpreting Logits Variation to Detect NLP Adversarial Attacks
Edoardo Mosca
|
Shreyash Agarwal
|
Javier Rando Ramírez
|
Georg Groh
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-of-the-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.
pdf
bib
abs
Explaining Neural NLP Models for the Joint Analysis of Open-and-Closed-Ended Survey Answers
Edoardo Mosca
|
Katharina Harmann
|
Tobias Eder
|
Georg Groh
Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)
Large-scale surveys are a widely used instrument to collect data from a target audience. Beyond the single individual, an appropriate analysis of the answers can reveal trends and patterns and thus generate new insights and knowledge for researchers. Current analysis practices employ shallow machine learning methods or rely on (biased) human judgment. This work investigates the usage of state-of-the-art NLP models such as BERT to automatically extract information from both open- and closed-ended questions. We also leverage explainability methods at different levels of granularity to further derive knowledge from the analysis model. Experiments on EMS—a survey-based study researching influencing factors affecting a student’s career goals—show that the proposed approach can identify such factors both at the input- and higher concept-level.
pdf
bib
abs
GrammarSHAP: An Efficient Model-Agnostic and Structure-Aware NLP Explainer
Edoardo Mosca
|
Defne Demirtürk
|
Luca Mülln
|
Fabio Raffagnato
|
Georg Groh
Proceedings of the First Workshop on Learning with Natural Language Supervision
Interpreting NLP models is fundamental for their development as it can shed light on hidden properties and unexpected behaviors. However, while transformer architectures exploit contextual information to enhance their predictive capabilities, most of the available methods to explain such predictions only provide importance scores at the word level. This work addresses the lack of feature attribution approaches that also take into account the sentence structure. We extend the SHAP framework by proposing GrammarSHAP—a model-agnostic explainer leveraging the sentence’s constituency parsing to generate hierarchical importance scores.
pdf
bib
abs
SHAP-Based Explanation Methods: A Review for NLP Interpretability
Edoardo Mosca
|
Ferenc Szigeti
|
Stella Tragianni
|
Daniel Gallagher
|
Georg Groh
Proceedings of the 29th International Conference on Computational Linguistics
Model explanations are crucial for the transparent, safe, and trustworthy deployment of machine learning models. The SHapley Additive exPlanations (SHAP) framework is considered by many to be a gold standard for local explanations thanks to its solid theoretical background and general applicability. In the years following its publication, several variants appeared in the literature—presenting adaptations in the core assumptions and target applications. In this work, we review all relevant SHAP-based interpretability approaches available to date and provide instructive examples as well as recommendations regarding their applicability to NLP use cases.
2021
pdf
bib
abs
Understanding and Interpreting the Impact of User Context in Hate Speech Detection
Edoardo Mosca
|
Maximilian Wich
|
Georg Groh
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media
As hate speech spreads on social media and online communities, research continues to work on its automatic detection. Recently, recognition performance has been increasing thanks to advances in deep learning and the integration of user features. This work investigates the effects that such features can have on a detection model. Unlike previous research, we show that simple performance comparison does not expose the full impact of including contextual- and user information. By leveraging explainability techniques, we show (1) that user features play a role in the model’s decision and (2) how they affect the feature space learned by the model. Besides revealing that—and also illustrating why—user features are the reason for performance gains, we show how such techniques can be combined to better understand the model and to detect unintended bias.