Marco Polignano


2024

pdf bib
LLaMAntino against Cyber Intimate Partner Violence
Pierpaolo Basile | Marco Degemmis | Marco Polignano | Giovanni Semeraro | Lucia Siciliani | Vincenzo Tamburrano | Fabiana Battista | Rosa Scardigno
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Intimate Partner Violence refers to the abusive behaviours perpetrated on their own partner. Unfortunately this is a social issue that has witnessed an increase over time, particularly after Covid-19. IPV be circumscribed into two broad categories known as Intimate Partner Violence (IPV) and Cyber Intimate Partner Violence (C-IPV). Social Media and technologies can exacerbate these types of behaviors but some “digital footprints”, such as textual conversations, can be exploited by Artificial Intelligence models to detect and, in turn, prevent them. With this aim in mind, this paper describes a scenario in which the Italian Language Model family LLAmAntino can be exploited to explain the presence of toxicity elements in conversations related to teenage relationships and then educate the interlocutor to recognize these elements in the messages received.

pdf bib
Unraveling the Enigma of SPLIT in Large-Language Models: The Unforeseen Impact of System Prompts on LLMs with Dissociative Identity Disorder
Marco Polignano | Marco De Gemmis | Giovanni Semeraro
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Our work delves into the unexplored territory of Large-Language Models (LLMs) and their interactions with System Prompts, unveiling the previously undiscovered implications of SPLIT (System Prompt Induced Linguistic Transmutation) in commonly used state-of-the-art LLMs. Dissociative Identity Disorder, a complex and multifaceted mental health condition, is characterized by the presence of two or more distinct identities or personas within an individual, often with varying levels of awareness and control. The advent of large-language models has raised intriguing questions about the presence of such conditions in LLMs. Our research investigates the phenomenon of SPLIT, in which the System Prompt, a seemingly innocuous input, profoundly impacts the linguistic outputs of LLMs. The findings of our study reveal a striking correlation between the System Prompt and the emergence of distinct, persona-like linguistic patterns in the LLM’s responses. These patterns are not only reminiscent of the dissociative identities present in the original data but also exhibit a level of coherence and consistency that is uncommon in typical LLM outputs. As we continue to explore the capabilities of LLMs, it is imperative that we maintain a keen awareness of the potential for SPLIT and its significant implications for the development of more human-like and empathetic AI systems.

2022

pdf bib
An NLP Approach for the Analysis of Global Reporting Initiative Indexes from Corporate Sustainability Reports
Marco Polignano | Nicola Bellantuono | Francesco Paolo Lagrasta | Sergio Caputo | Pierpaolo Pontrandolfo | Giovanni Semeraro
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

Sustainability reporting has become an annual requirement in many countries and for certain types of companies. Sustainability reports inform stakeholders about companies’ commitment to sustainable development and their economic, social, and environmental sustainability practices. However, the fact that norms and standards allow a certain discretion to be adopted by drafting organizations makes such reports hardly comparable in terms of layout, disclosures, key performance indicators (KPIs), and so on. In this work, we present a system based on natural language processing and information extraction techniques to retrieve relevant information from sustainability reports, compliant with the Global Reporting Initiative Standards, written in Italian and English language. Specifically, the system is able to identify references to the various sustainability topics discussed by the reports: on which page of the document those references have been found, the context of each reference, and if it is mentioned positively or negatively. The output of the system has been then evaluated against a ground truth obtained through a manual annotation process on 134 reports. Experimental outcomes highlight the affordability of the approach for improving sustainability disclosures, accessibility, and transparency, thus empowering stakeholders to conduct further analysis and considerations.

2020

pdf bib
GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
Pierluigi Cassotti | Annalina Caputo | Marco Polignano | Pierpaolo Basile
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system proposed by the Random team for SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focus our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or lost senses. To this end, we define a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compare the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system.

2019

pdf bib
SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks
Marco Polignano | Marco de Gemmis | Giovanni Semeraro
Proceedings of the 13th International Workshop on Semantic Evaluation

Emotion detection from user-generated contents is growing in importance in the area of natural language processing. The approach we proposed for the EmoContext task is based on the combination of a CNN and an LSTM using a concatenation of word embeddings. A stack of convolutional neural networks (CNN) is used for capturing the hierarchical hidden relations among embedding features. Meanwhile, a long short-term memory network (LSTM) is used for capturing information shared among words of the sentence. Each conversation has been formalized as a list of word embeddings, in particular during experimental runs pre-trained Glove and Google word embeddings have been evaluated. Surface lexical features have been also considered, but they have been demonstrated to be not usefully for the classification in this specific task. The final system configuration achieved a micro F1 score of 0.7089. The python code of the system is fully available at https://github.com/marcopoli/EmoContext2019