Bridge the GAP: Multi-lingual Models For Ambiguous Pronominal Coreference Resolution in South Asian Languages

Rahothvarman P, Adith John Rajeev, Kaveri Anuranjana, Radhika Mamidi


Abstract
Coreference resolution, the process of determining what a referring expression (a pronoun or a noun phrase) refers to in discourse, is a critical aspect of natural language understanding. However, the development of computational models for coreference resolution in low-resource languages, such as the Dravidian (and more broadly all South Asian) languages, still remains a significant challenge due to the scarcity of annotated corpora in these languages. To address this data scarcity, we adopt a pipeline that translates the English GAP dataset into various South Asian languages, creating a multi-lingual coreference dataset mGAP. Our research aims to leverage this dataset and develop two novel models, namely the joint embedding model and the cross attention model for coreference resolution with Dravidian languages in mind. We also demonstrate that cross-attention captures pronoun-candidate relations better leading to improved coreference resolution. We also harness the similarity across South Asian languages via transfer learning in order to use high resource languages to learn coreference for low resource languages.
Anthology ID:
2025.chipsal-1.10
Volume:
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Kengatharaiyer Sarveswaran, Ashwini Vaidya, Bal Krishna Bal, Sana Shams, Surendrabikram Thapa
Venues:
CHiPSAL | WS
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
104–114
Language:
URL:
https://aclanthology.org/2025.chipsal-1.10/
DOI:
Bibkey:
Cite (ACL):
Rahothvarman P, Adith John Rajeev, Kaveri Anuranjana, and Radhika Mamidi. 2025. Bridge the GAP: Multi-lingual Models For Ambiguous Pronominal Coreference Resolution in South Asian Languages. In Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), pages 104–114, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):
Bridge the GAP: Multi-lingual Models For Ambiguous Pronominal Coreference Resolution in South Asian Languages (P et al., CHiPSAL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.chipsal-1.10.pdf