Towards Fast and Accurate Modeling for Cross-Lingual Label Projection

Thang Le; Huy Huu Nguyen; Luu Anh Tuan; Thamar Solorio; Thien Huu Nguyen

Towards Fast and Accurate Modeling for Cross-Lingual Label Projection

Thang Le, Huy Huu Nguyen, Anh Tuan Luu, Thamar Solorio, Thien Huu Nguyen

Abstract

Information extraction (IE) systems rely on structured data for training, but such annotated data is highly imbalanced across languages, with low-resource languages receiving little attention. Label projection techniques aim to bridge this gap by transferring structured annotations from high-resource to low-resource languages. However, existing methods are either inaccurate or too slow for large-scale use. This work aims to address this problem by developing a more effective method that remains sufficiently efficient for large-scale projection. In particular, we propose to synthesize alignment sequence pairs and fine-tune an encoder model with span alignment objective, while controlling data influence during training. Experimental results across 50+ languages show that our framework consistently outperforms previous state-of-the-art methods while maintaining fast inference speed. In addition, we introduce EXP - the first benchmark for explicit evaluation of label projection, thereby reducing confounders and non-determinism in method assessment.

Anthology ID:: 2026.acl-long.1817
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 39175–39198
Language:
URL:: https://aclanthology.org/2026.acl-long.1817/
DOI:
Bibkey:
Cite (ACL):: Thang Le, Huy Huu Nguyen, Anh Tuan Luu, Thamar Solorio, and Thien Huu Nguyen. 2026. Towards Fast and Accurate Modeling for Cross-Lingual Label Projection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 39175–39198, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Towards Fast and Accurate Modeling for Cross-Lingual Label Projection (Le et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1817.pdf
Checklist:: 2026.acl-long.1817.checklist.pdf

PDF Cite Search Checklist Fix data