Lazy Low-Resource Coreference Resolution: a Study on Leveraging Black-Box Translation Tools

Semere Kiros Bitew, Johannes Deleu, Chris Develder, Thomas Demeester


Abstract
Large annotated corpora for coreference resolution are available for few languages. For machine translation, however, strong black-box systems exist for many languages. We empirically explore the appealing idea of leveraging such translation tools for bootstrapping coreference resolution in languages with limited resources. Two scenarios are analyzed, in which a large coreference corpus in a high-resource language is used for coreference predictions in a smaller language, i.e., by machine translating either the training corpus or the test data. In our empirical evaluation of coreference resolution using the two scenarios on several medium-resource languages, we find no improvement over monolingual baseline models. Our analysis of the various sources of error inherent to the studied scenarios, reveals that in fact the quality of contemporary machine translation tools is the main limiting factor.
Anthology ID:
2021.crac-1.6
Volume:
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
CRAC | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
57–62
Language:
URL:
https://aclanthology.org/2021.crac-1.6
DOI:
10.18653/v1/2021.crac-1.6
Bibkey:
Cite (ACL):
Semere Kiros Bitew, Johannes Deleu, Chris Develder, and Thomas Demeester. 2021. Lazy Low-Resource Coreference Resolution: a Study on Leveraging Black-Box Translation Tools. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 57–62, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Lazy Low-Resource Coreference Resolution: a Study on Leveraging Black-Box Translation Tools (Bitew et al., CRAC 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.crac-1.6.pdf