Automated classification of causal relations. Evaluating different LLM performances.

Giacomo Magnifico

Automated classification of causal relations. Evaluating different LLM performances.

Abstract

The search for formal causal relations in natural language faces inherent limitations due to the lack of mathematically and logically informed datasets. Thus, the exploration of causal relations in natural language leads to the analysis of formal-logic-adjacent language patterns. Thanks to the recent advancements of generative LLMs, this research niche is expanding within the field of natural language processing and evaluation. In this work, we conduct an evaluation of 9 models produced by different AI developing companies in order to answer the question “Are LLMs capable of discerning between different types of causal relations?”. The SciExpl dataset is chosen as a natural language corpus, and we develop three different prompt types aligned with zero-shot, few-shot, and chain-of-thought standards to evaluate the performance of the LLMs. Claude 3.7 Sonnet and Gemini 2.5 Flash Preview emerge as the best models for the task, with the respective highest F1 scores of 0.842 (few-shot prompting) and 0.846 (chain-of-thought prompting).

Anthology ID:: 2025.ranlp-stud.4
Volume:: Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Boris Velichkov, Ivelina Nikolova-Koleva, Milena Slavcheva
Venues:: RANLP | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 27–36
Language:
URL:: https://aclanthology.org/2025.ranlp-stud.4/
DOI:
Bibkey:
Cite (ACL):: Giacomo Magnifico. 2025. Automated classification of causal relations. Evaluating different LLM performances.. In Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing, pages 27–36, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Automated classification of causal relations. Evaluating different LLM performances. (Magnifico, RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-stud.4.pdf

PDF Cite Search Fix data