Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

Abdellah El Mekki; Muhammad Abdul-Mageed

doi:10.18653/v1/2025.findings-naacl.238

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

Abdellah El Mekki, Muhammad Abdul-Mageed

Abstract

Large Language Models (LLMs) have demonstrated impressive performance on a wide range of natural language processing (NLP) tasks, primarily through in-context learning (ICL). In ICL, the LLM is provided with examples that represent a given task such that it learns to generate answers for test inputs. However, access to these in-context examples is not guaranteed especially for low-resource or massively multilingual tasks. In this work, we propose an unsupervised approach to mine in-context examples for machine translation (MT), enabling unsupervised MT (UMT) across different languages. Our approach begins with word-level mining to acquire word translations that are then used to perform sentence-level mining. As the quality of mined parallel pairs may not be optimal due to noise or mistakes, we introduce a filtering criterion to select the optimal in-context examples from a pool of unsupervised parallel sentences. We evaluate our approach using two multilingual LLMs on 288 directions from the FLORES-200 dataset (CITATION) and analyze the impact of various linguistic features on performance. Our findings demonstrate the effectiveness of our unsupervised approach in mining in-context examples for MT, leading to better or comparable translation performance as translation with regular in-context samples (extracted from human-annotated data), while also outperforming the other state-of-the-art UMT methods by an average of 7 BLEU points.

Anthology ID:: 2025.findings-naacl.238
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4229–4256
Language:
URL:: https://aclanthology.org/2025.findings-naacl.238/
DOI:: 10.18653/v1/2025.findings-naacl.238
Bibkey:
Cite (ACL):: Abdellah El Mekki and Muhammad Abdul-Mageed. 2025. Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 4229–4256, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs (El Mekki & Abdul-Mageed, Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.238.pdf

PDF Cite Search Fix data