Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding Are Both the Problem

Sara Court; Micha Elsner

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding Are Both the Problem

Abstract

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of information retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when provided a minimally sufficient amount of relevant linguistic information. However, the variable effects of prompt type, retrieval method, model type, and language community-specific factors highlight the limitations of using even the best LLMs as translation systems for the majority of the world’s 7,000+ languages and their speakers.

Anthology ID:: 2024.wmt-1.125
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1332–1354
Language:
URL:: https://aclanthology.org/2024.wmt-1.125
DOI:
Bibkey:
Cite (ACL):: Sara Court and Micha Elsner. 2024. Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding Are Both the Problem. In Proceedings of the Ninth Conference on Machine Translation, pages 1332–1354, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding Are Both the Problem (Court & Elsner, WMT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wmt-1.125.pdf

PDF Cite Search