Topic-guided Example Selection for Domain Adaptation in LLM-based Machine Translation

Seth Aycock; Rachel Bawden

Topic-guided Example Selection for Domain Adaptation in LLM-based Machine Translation

Abstract

Current machine translation (MT) systems perform well in the domains on which they were trained, but adaptation to unseen domains remains a challenge. Rather than fine-tuning on domain data or modifying the architecture for training, an alternative approach exploits large language models (LLMs), which are performant across NLP tasks especially when presented with in-context examples. We focus on adapting a pre-trained LLM to a domain at inference through in-context example selection. For MT, examples are usually randomly selected from a development set. Some more recent methods though select using the more intuitive basis of test source similarity. We employ topic models to select examples based on abstract semantic relationships below the level of a domain. We test the relevance of these statistical models and use them to select informative examples even for out-of-domain inputs, experimenting on 7 diverse domains and 11 language pairs of differing resourcedness. Our method outperforms baselines on challenging multilingual out-of-domain tests, though it does not match performance with strong baselines for the in-language setting. We find that adding few-shot examples and related keywords consistently improves translation quality, that example diversity must be balanced with source similarity, and that our pipeline is overly restrictive for example selection when a targeted development set is available.

Anthology ID:: 2024.eacl-srw.13
Volume:: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Neele Falk, Sara Papi, Mike Zhang
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 175–195
Language:
URL:: https://aclanthology.org/2024.eacl-srw.13/
DOI:
Bibkey:
Cite (ACL):: Seth Aycock and Rachel Bawden. 2024. Topic-guided Example Selection for Domain Adaptation in LLM-based Machine Translation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 175–195, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: Topic-guided Example Selection for Domain Adaptation in LLM-based Machine Translation (Aycock & Bawden, EACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.eacl-srw.13.pdf
Video:: https://aclanthology.org/2024.eacl-srw.13.mp4

PDF Cite Search Video Fix data