Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal

Bryan Li

doi:10.18653/v1/2025.naacl-srw.30

Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal

Abstract

The frontier of large language model (LLM) development has largely been substantiated by knowledge-intensive tasks specified in English. In this proposed thesis, I argue for the key role that multilinguality occupies in the development of practical and knowledgeable LLMs.First, I consider practical methods to improve LLM’s performance on standard natural language processing (NLP) tasks by leveraging their existing multilingual knowledge.Then, I investigate the underlying multilingual knowledge of LLMs with two benchmarks: on complex reasoning, and on territorial disputes. These benchmarks reveal LLMs’ inconsistent performance across languages. I then design efficient techniques, both at inference-time and training-time, to address these discrepancies. Finally, I extend the territorial disputes benchmark to retrieval-augmented generation (RAG) setting, comparing the effects of different retrieval settings on cross-lingual robustness. My proposal shows that informed use of multilinguality enhances LLMs’ capabilities, and our understanding thereof.

Anthology ID:: 2025.naacl-srw.30
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 301–310
Language:
URL:: https://aclanthology.org/2025.naacl-srw.30/
DOI:: 10.18653/v1/2025.naacl-srw.30
Bibkey:
Cite (ACL):: Bryan Li. 2025. Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 301–310, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal (Li, NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-srw.30.pdf

PDF Cite Search Fix data