McGill NLP Group Submission to the MRL 2024 Shared Task: Ensembling Enhances Effectiveness of Multilingual Small LMs

Senyu Li, Hao Yu, Jessica Ojo, David Ifeoluwa Adelani


Abstract
We present our systems for the three tasks and five languages included in the MRL 2024 Shared Task on Multilingual Multi-task Information Retrieval: (1) Named Entity Recognition, (2) Free-form Question Answering, and (3) Multiple-choice Question Answering. For each task, we explored the impact of selecting different multilingual language models for fine-tuning across various target languages, and implemented an ensemble system that generates final outputs based on predictions from multiple fine-tuned models. All models are large language models fine-tuned on task-specific data. Our experimental results show that a more balanced dataset would yield better results. However, when training data for certain languages are scarce, fine-tuning on a large amount of English data supplemented by a small amount of “triggering data” in the target language can produce decent results.
Anthology ID:
2024.mrl-1.28
Volume:
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Jonne Sälevä, Abraham Owodunni
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
346–356
Language:
URL:
https://aclanthology.org/2024.mrl-1.28
DOI:
Bibkey:
Cite (ACL):
Senyu Li, Hao Yu, Jessica Ojo, and David Ifeoluwa Adelani. 2024. McGill NLP Group Submission to the MRL 2024 Shared Task: Ensembling Enhances Effectiveness of Multilingual Small LMs. In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 346–356, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
McGill NLP Group Submission to the MRL 2024 Shared Task: Ensembling Enhances Effectiveness of Multilingual Small LMs (Li et al., MRL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mrl-1.28.pdf