MLT-DR: Multi-Lingual/Task Demonstration RetrievalAn Attempt towards Generalized Retriever for In-Context Learning

Kazuma Hashimoto, Arjun Reddy Akula, Karthik Raman, Michael Bendersky


Abstract
This paper presents Multi-Lingual/Task Demonstration Retrieval (MLT-DR) for in-context learning with Large Language Models (LLMs).Our goal is to investigate how dense demonstration retrieval models are generalized across languages and tasks.We first convert 81 tasks into a common format, covering various languages, task types, and domains.For 8 English-based tasks among them, we use machine translation to create synthetic multi/cross-lingual tasks, by translating the examples into non-English languages to explicitly cover more than 130 languages.We then use an instruction-tuned LLM to estimate utility of demonstrations for all the tasks to train the demonstration retrieval models.In our experiments, we show an interesting counterintuitive observation; to compute embeddings of demonstrations, using both the input and ground-truth output hurts the generalization ability of the retriever on unseen tasks whose output space is quite different from those in the seen task set.We also examine that our retriever robustly works even with LLMs that we did not touch during the development of the models.The retrieval models’ checkpoints are publicly available at URL-available-upon-publication.
Anthology ID:
2024.mrl-1.27
Volume:
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Jonne Sälevä, Abraham Owodunni
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
324–345
Language:
URL:
https://aclanthology.org/2024.mrl-1.27
DOI:
Bibkey:
Cite (ACL):
Kazuma Hashimoto, Arjun Reddy Akula, Karthik Raman, and Michael Bendersky. 2024. MLT-DR: Multi-Lingual/Task Demonstration RetrievalAn Attempt towards Generalized Retriever for In-Context Learning. In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 324–345, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MLT-DR: Multi-Lingual/Task Demonstration RetrievalAn Attempt towards Generalized Retriever for In-Context Learning (Hashimoto et al., MRL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mrl-1.27.pdf