Marco Polignano

2025

Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Cristina Bosco | Elisabetta Jezek | Marco Polignano | Manuela Sanguinetti
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib

Preface
Cristina Bosco | Elisabetta Jezek | Marco Polignano | Manuela Sanguinetti
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib

Diffusion-Aided RAG: Elevating Dense-Retrieval Chatbots via Graph-Based Diffusion Reranking
Sai Teja Dampanaboina | Sai Nishchal Gamini | Karishma Kunwar | Marco Polignano | Marco Levantesi | Giovanni Semeraro | Ernesto William De Luca
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

2024

pdf bib abs

Intimate Partner Violence refers to the abusive behaviours perpetrated on their own partner. Unfortunately this is a social issue that has witnessed an increase over time, particularly after Covid-19. IPV be circumscribed into two broad categories known as Intimate Partner Violence (IPV) and Cyber Intimate Partner Violence (C-IPV). Social Media and technologies can exacerbate these types of behaviors but some “digital footprints”, such as textual conversations, can be exploited by Artificial Intelligence models to detect and, in turn, prevent them. With this aim in mind, this paper describes a scenario in which the Italian Language Model family LLAmAntino can be exploited to explain the presence of toxicity elements in conversations related to teenage relationships and then educate the interlocutor to recognize these elements in the messages received.

pdf bib abs

Unraveling the Enigma of SPLIT in Large-Language Models: The Unforeseen Impact of System Prompts on LLMs with Dissociative Identity Disorder
Marco Polignano | Marco De Gemmis | Giovanni Semeraro
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Our work delves into the unexplored territory of Large-Language Models (LLMs) and their interactions with System Prompts, unveiling the previously undiscovered implications of SPLIT (System Prompt Induced Linguistic Transmutation) in commonly used state-of-the-art LLMs. Dissociative Identity Disorder, a complex and multifaceted mental health condition, is characterized by the presence of two or more distinct identities or personas within an individual, often with varying levels of awareness and control. The advent of large-language models has raised intriguing questions about the presence of such conditions in LLMs. Our research investigates the phenomenon of SPLIT, in which the System Prompt, a seemingly innocuous input, profoundly impacts the linguistic outputs of LLMs. The findings of our study reveal a striking correlation between the System Prompt and the emergence of distinct, persona-like linguistic patterns in the LLM’s responses. These patterns are not only reminiscent of the dissociative identities present in the original data but also exhibit a level of coherence and consistency that is uncommon in typical LLM outputs. As we continue to explore the capabilities of LLMs, it is imperative that we maintain a keen awareness of the potential for SPLIT and its significant implications for the development of more human-like and empathetic AI systems.

2023

pdf bib

On the Impact of Language Adaptation for Large Language Models: A Case Study for the Italian Language Using Only Open Resources
Pierpaolo Basile | Pierluigi Cassotti | Marco Polignano | Lucia Siciliani | Giovanni Semeraro
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2022

pdf bib abs

An NLP Approach for the Analysis of Global Reporting Initiative Indexes from Corporate Sustainability Reports
Marco Polignano | Nicola Bellantuono | Francesco Paolo Lagrasta | Sergio Caputo | Pierpaolo Pontrandolfo | Giovanni Semeraro
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

Sustainability reporting has become an annual requirement in many countries and for certain types of companies. Sustainability reports inform stakeholders about companies’ commitment to sustainable development and their economic, social, and environmental sustainability practices. However, the fact that norms and standards allow a certain discretion to be adopted by drafting organizations makes such reports hardly comparable in terms of layout, disclosures, key performance indicators (KPIs), and so on. In this work, we present a system based on natural language processing and information extraction techniques to retrieve relevant information from sustainability reports, compliant with the Global Reporting Initiative Standards, written in Italian and English language. Specifically, the system is able to identify references to the various sustainability topics discussed by the reports: on which page of the document those references have been found, the context of each reference, and if it is mentioned positively or negatively. The output of the system has been then evaluated against a ground truth obtained through a manual annotation process on 134 reports. Experimental outcomes highlight the affordability of the approach for improving sustainability disclosures, accessibility, and transparency, thus empowering stakeholders to conduct further analysis and considerations.

2020

pdf bib

A Deep Learning Model for the Analysis of Medical Reports in ICD-10 Clinical Coding Task
Marco Polignano | Pierpaolo Basile | Marco de Gemmis | Pasquale Lops | Giovanni Semeraro
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs

GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
Pierluigi Cassotti | Annalina Caputo | Marco Polignano | Pierpaolo Basile
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system proposed by the Random team for SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focus our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or lost senses. To this end, we define a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compare the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system.

2019

pdf bib

HateChecker: a Tool to Automatically Detect Hater Users in Online Social Networks
Cataldo Musto | Angelo Sansonetti | Marco Polignano | Giovanni Semeraro | Marco Stranisci
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib

AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets
Marco Polignano | Pierpaolo Basile | Marco de Gemmis | Giovanni Semeraro | Valerio Basile
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib

pdf bib abs

SWAP at SemEval-2019 Task 3: Emotion detection in conversations through Tweets, CNN and LSTM deep neural networks
Marco Polignano | Marco de Gemmis | Giovanni Semeraro
Proceedings of the 13th International Workshop on Semantic Evaluation

Emotion detection from user-generated contents is growing in importance in the area of natural language processing. The approach we proposed for the EmoContext task is based on the combination of a CNN and an LSTM using a concatenation of word embeddings. A stack of convolutional neural networks (CNN) is used for capturing the hierarchical hidden relations among embedding features. Meanwhile, a long short-term memory network (LSTM) is used for capturing information shared among words of the sentence. Each conversation has been formalized as a list of word embeddings, in particular during experimental runs pre-trained Glove and Google word embeddings have been evaluated. Surface lexical features have been also considered, but they have been demonstrated to be not usefully for the classification in this specific task. The final system configuration achieved a micro F1 score of 0.7089. The python code of the system is fully available at https://github.com/marcopoli/EmoContext2019

Marco Polignano

2025

2024

2023

2022

2020

2019

Co-authors

Venues