Filippos Karolos Ventirozos
2025
Learn, Achieve, Predict, Propose, Forget, Suffer: Analysing and Classifying Anthropomorphisms of LLMs
Matthew Shardlow
|
Ashley Williams
|
Charlie Roadhouse
|
Filippos Karolos Ventirozos
|
Piotr Przybyła
Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models
Anthropomorphism is a literary device where human-like characteristics are used to refer to non-human entities. However, the use of anthropomorphism in the scientific description and public communication of large language models could lead to misunderstanding amongst scientists and lay-people regarding the technical capabilities and limitations of these models. In this study, we present an analysis of anthropomorphised language commonly used to describe LLMs, showing that the presence of terms such as ‘learn’, ‘achieve’, ‘predict’ and ‘can’ are typically correlated with human labels of anthropomorphism. We also perform experiments to develop a classification system for anthropomorphic descriptions of LLMs in scientific writing at the sentence level. We find that whilst a supervised Roberta-based system identifies anthropomorphisms with F1-score of 0.564, state-of-the-art LLM-based approaches regularly overfit to the task.
Aspect–Sentiment Quad Prediction with Distilled Large Language Models
Filippos Karolos Ventirozos
|
Peter Appleby
|
Matthew Shardlow
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Aspect-based sentiment analysis offers detailed insights by pinpointing specific product aspects in a text that are associated with sentiments. This study explores it through the prediction of quadruples, comprising aspect, category, opinion, and polarity. We evaluated in-context learning strategies using recently released distilled large language models, ranging from zero to full-dataset demonstrations. Our findings reveal that the performance of these models now positions them between the current state-of-the-art and significantly higher than their earlier generations. Additionally, we experimented with various chain-of-thought prompts, examining sequences such as aspect to category to sentiment in different orders. Our results indicate that the optimal sequence differs from previous assumptions. Additionally, we found that for quadruple prediction, few-shot demonstrations alone yield better performance than chain-of-thought prompting.