Towards Adding Arabic to CorefUD

Dima Taji, Daniel Zeman


Abstract
Training models that can perform well on various NLP tasks requires large amounts of data, which becomes even more apparent with more nuanced tasks such as anaphora and coreference resolution. This paper presents the automatic creation of an Arabic CorefUD dataset through the automatic conversion of the existing gold-annotated OntoNotes.
Anthology ID:
2025.crac-1.6
Volume:
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Maciej Ogrodniczuk, Michal Novak, Massimo Poesio, Sameer Pradhan, Vincent Ng
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
70–76
Language:
URL:
https://aclanthology.org/2025.crac-1.6/
DOI:
Bibkey:
Cite (ACL):
Dima Taji and Daniel Zeman. 2025. Towards Adding Arabic to CorefUD. In Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 70–76, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Towards Adding Arabic to CorefUD (Taji & Zeman, CRAC 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.crac-1.6.pdf