Fine-Tuned Llama for Multilingual Text-to-Text Coreference Resolution

Jakub Hejman, Ondrej Prazak, Miloslav Konopík


Abstract
This paper describes our approach to the CRAC 2025 Shared Task on Multilingual Coreference Resolution. We compete in the LLM track, where the systems are limited to generative text-to-text approaches. Our system is based on Llama 3.1-8B, fine-tuned to tag the document with coreference annotations. We have made one significant modification to the text format provided by the organizers: The model relies on the syntactic head for mention span representation. Additionally, we use joint pre-training, and we train the model to generate empty nodes. We provide an in-depth analysis of the performance of our models, which reveals several implementation problems. Although our system ended up in last place, we achieved the best performance on 10 datasets out of 22 within the track. By fixing the discovered problems in the post-evaluation phase, we improved our results substantially, outperforming all the systems in the LLM track and even some unconstrained track systems.
Anthology ID:
2025.crac-1.12
Volume:
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Maciej Ogrodniczuk, Michal Novak, Massimo Poesio, Sameer Pradhan, Vincent Ng
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–148
Language:
URL:
https://aclanthology.org/2025.crac-1.12/
DOI:
Bibkey:
Cite (ACL):
Jakub Hejman, Ondrej Prazak, and Miloslav Konopík. 2025. Fine-Tuned Llama for Multilingual Text-to-Text Coreference Resolution. In Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 140–148, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Fine-Tuned Llama for Multilingual Text-to-Text Coreference Resolution (Hejman et al., CRAC 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.crac-1.12.pdf