Tuğba Pamay Arslan
2026
CorefInst: Leveraging LLMs for Multilingual Coreference Resolution
Tuğba Pamay Arslan | Emircan Erol | Gülşen Eryiğit
Transactions of the Association for Computational Linguistics, Volume 14
Tuğba Pamay Arslan | Emircan Erol | Gülşen Eryiğit
Transactions of the Association for Computational Linguistics, Volume 14
Coreference Resolution (CR) is a crucial yet challenging task in natural language understanding, often constrained by task-specific architectures and encoder-based language models that demand extensive training and lack adaptability. This study introduces the first multilingual CR methodology which leverages decoder-only LLMs to handle both overt and zero mentions. The article explores how to model the CR task for LLMs via five different instruction sets using a controlled inference method. The approach is evaluated across three LLMs: Llama 3.1, Gemma 2, and Mistral 0.3. The results indicate that LLMs, when instruction-tuned with a suitable instruction set, can surpass state-of-the-art task-specific architectures. Specifically, our best model, a fully fine-tuned Llama 3.1 for multilingual CR, outperforms the leading multilingual CR model (i.e., Corpipe 24 single stage variant) by 2 percentage points on average across all languages in the CorefUD v1.2 dataset collection.
2023
Neural End-to-End Coreference Resolution using Morphological Information
Tuğba Pamay Arslan | Kutay Acar | Gülşen Eryiğit
Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution
Tuğba Pamay Arslan | Kutay Acar | Gülşen Eryiğit
Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution
Incorporating Dropped Pronouns into Coreference Resolution: The case for Turkish
Tuğba Pamay Arslan | Gülşen Eryiğit
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Tuğba Pamay Arslan | Gülşen Eryiğit
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Representation of coreferential relations is a challenging and actively studied topic for pro-drop and morphologically rich languages (PD-MRLs) due to dropped pronouns (e.g., null subjects and omitted possessive pronouns). These phenomena require a representation scheme at the morphology level and enhanced evaluation methods. In this paper, we propose a representation & evaluation scheme to incorporate dropped pronouns into coreference resolution and validate it on the Turkish language. Using the scheme, we extend the annotations on the only existing Turkish coreference dataset, which originally did not contain annotations for dropped pronouns. We provide publicly available pre and post processors to enhance the prominent CoNLL coreference scorer also to cover coreferential relations arising from dropped pronouns. As a final step, the paper reports the first neural Turkish coreference resolution results in the literature. Although validated on Turkish, the proposed scheme is language-independent and may be used for other PD-MRLs.