LLMs Are Few-Shot In-Context Low-Resource Language Learners

Samuel Cahyawijaya; Holy Lovenia; Pascale Fung

doi:10.18653/v1/2024.naacl-long.24

LLMs Are Few-Shot In-Context Low-Resource Language Learners

Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

Abstract

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages.Nonetheless, there is only a handful of works explored ICL for low-resource languages with most of them focusing on relatively high-resource languages, such as French and Spanish. In this work, we extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages.Our study not only assesses the effectiveness of ICL with LLMs in low-resource languages but also identifies the shortcomings of in-context label alignment, and introduces a more effective alternative: query alignment. Moreover, we provide valuable insights into various facets of ICL for low-resource languages.Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs through semantically relevant information by closing the language gap in the target language and aligning the semantics between the targeted low-resource and the high-resource language that the model is proficient in. Our work highlights the importance of advancing ICL research, particularly for low-resource languages.

Anthology ID:: 2024.naacl-long.24
Original:: 2024.naacl-long.24v1
Version 2:: 2024.naacl-long.24v2
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 405–433
Language:
URL:: https://aclanthology.org/2024.naacl-long.24/
DOI:: 10.18653/v1/2024.naacl-long.24
Bibkey:
Cite (ACL):: Samuel Cahyawijaya, Holy Lovenia, and Pascale Fung. 2024. LLMs Are Few-Shot In-Context Low-Resource Language Learners. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 405–433, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: LLMs Are Few-Shot In-Context Low-Resource Language Learners (Cahyawijaya et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.24.pdf
Video:: https://aclanthology.org/2024.naacl-long.24.mp4

PDF (v2) PDF (v1) Cite Search Video Fix data