Evaluating Multilingual Long-Context Models for Retrieval and Reasoning

Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad, Rhitabrat Pokharel, Russell Scheinberg


Abstract
Recent large language models (LLMs) demonstrate impressive capabilities in handling long contexts, some exhibiting near-perfect recall on synthetic retrieval tasks. However, these evaluations have mainly focused on English text and involved a single target sentence within lengthy contexts. Our work investigates how LLM performance generalizes to multilingual settings with multiple hidden target sentences. We create a new dataset – mLongRR – to comprehensively evaluate several multilingual long-context LLMs on retrieval and reasoning tasks across five languages: English, Vietnamese, Indonesian, Swahili, and Somali. These languages share the Latin script but belong to distinct language families and resource levels. Our analysis reveals a significant performance gap between languages. The best-performing models such as Gemini-1.5 and GPT-4o, achieve around 96% accuracy in English to around 36% in Somali with a single target sentence. However, this accuracy drops to 40% in English and 0% in Somali when dealing with three target sentences. Our findings highlight the challenges long-context LLMs face when processing longer contexts, an increase in the number of target sentences, or languages of lower resource levels.
Anthology ID:
2024.mrl-1.18
Volume:
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Jonne Sälevä, Abraham Owodunni
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
216–231
Language:
URL:
https://aclanthology.org/2024.mrl-1.18
DOI:
Bibkey:
Cite (ACL):
Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad, Rhitabrat Pokharel, and Russell Scheinberg. 2024. Evaluating Multilingual Long-Context Models for Retrieval and Reasoning. In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 216–231, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluating Multilingual Long-Context Models for Retrieval and Reasoning (Agrawal et al., MRL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mrl-1.18.pdf