Andrianos Michail


2024

pdf bib
Utilizing Large Language Models to Identify Evidence of Suicidality Risk through Analysis of Emotionally Charged Posts
Ahmet Yavuz Uluslu | Andrianos Michail | Simon Clematide
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

This paper presents our contribution to the CLPsych 2024 shared task, focusing on the use of open-source large language models (LLMs) for suicide risk assessment through the analysis of social media posts. We achieved first place (out of 15 participating teams) in the task of providing summarized evidence of a user’s suicide risk. Our approach is based on Retrieval Augmented Generation (RAG), where we retrieve the top-k (k=5) posts with the highest emotional charge and provide the level of three different negative emotions (sadness, fear, anger) for each post during the generation phase.

2023

pdf bib
UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy Prediction
Andrianos Michail | Stefanos Konstantinou | Simon Clematide
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 “Multilingual Tweet Intimacy Analysis. We achieved second-best results in all 10 languages according to the official Pearson’s correlation regression evaluation measure. Our cross-lingual transfer learning approach explores the benefits of using a Head-First Fine-Tuning method (HeFiT) that first updates only the regression head parameters and then also updates the pre-trained transformer encoder parameters at a reduced learning rate. Additionally, we study the impact of using a small set of automatically generated examples (in our case, from ChatGPT) for low-resource settings where no human-labeled data is available. Our study shows that HeFiT stabilizes training and consistently improves results for pre-trained models that lack domain adaptation to tweets. Our study also shows a noticeable performance increase in cross-lingual learning when synthetic data is used, confirming the usefulness of current text generation systems to improve zeroshot baseline results. Finally, we examine how possible inconsistencies in the annotated data contribute to cross-lingual interference issues.