CLEAR: Cross-Lingual Enhancement in Retrieval via Reverse-training

Seungyoon Lee; Minhyuk Kim; Seongtae Hong; Youngjoon Jang; Dongsuk Oh; Heui-Seok Lim

CLEAR: Cross-Lingual Enhancement in Retrieval via Reverse-training

Seungyoon Lee, Minhyuk Kim, Seongtae Hong, Youngjoon Jang, Dongsuk Oh, Heuiseok Lim

Abstract

Existing multilingual embedding models often encounter challenges in cross-lingual scenarios due to imbalanced linguistic resources and less consideration of cross-lingual alignment during training. Although standardized contrastive learning approaches for cross-lingual adaptation are widely adopted, they may struggle to capture fundamental alignment between languages and degrade performance in well-aligned languages such as English. To address these challenges, we propose Cross-Lingual Enhancement in RetrievAl via Reverse-training (CLEAR), a novel loss function utilizing a reverse training scheme to improve retrieval performance across diverse cross-lingual retrieval scenarios. CLEAR leverages an English passage as a bridge to strengthen alignments between the target language and English, ensuring robust performance in the cross-lingual retrieval task. Our extensive experiments demonstrate that CLEAR achieves notable improvements in cross-lingual scenarios, with gains up to 15%, particularly in low-resource languages, while minimizing performance degradation in English. Furthermore, our findings highlight that CLEAR offers promising effectiveness even in multilingual training, suggesting its potential for broad application and scalability. We release the code at https://github.com/dltmddbs100/CLEAR.

Anthology ID:: 2026.acl-long.13
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 347–362
Language:
URL:: https://aclanthology.org/2026.acl-long.13/
DOI:
Bibkey:
Cite (ACL):: Seungyoon Lee, Minhyuk Kim, Seongtae Hong, Youngjoon Jang, Dongsuk Oh, and Heuiseok Lim. 2026. CLEAR: Cross-Lingual Enhancement in Retrieval via Reverse-training. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 347–362, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CLEAR: Cross-Lingual Enhancement in Retrieval via Reverse-training (Lee et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.13.pdf
Checklist:: 2026.acl-long.13.checklist.pdf

PDF Cite Search Checklist Fix data