Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages

Abubakar Auwal Khalid, Salisu Musa Borodo, Amina Abubakar Imam


Abstract
We present an improved method for automaticparallel sentence alignment in low- resourcelanguages. We used CoHere multilingualembeddings and inverted softmax retrieval.Our technique achieved a higher F1-score of78.30% on the MAFAND-MT test set, comparedto the existing technique’s 54.75%. Precisionand recall have shown similar performance.We assessed the quality of the extracted data bydemonstrating that it outperforms the existingtechnique in terms of low-resource translationperformance.
Anthology ID:
2026.africanlp-main.4
Volume:
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37–43
Language:
URL:
https://aclanthology.org/2026.africanlp-main.4/
DOI:
Bibkey:
Cite (ACL):
Abubakar Auwal Khalid, Salisu Musa Borodo, and Amina Abubakar Imam. 2026. Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 37–43, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages (Khalid et al., AfricaNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.africanlp-main.4.pdf