Birdie: Advancing State Space Language Modeling with Dynamic Mixtures of Training Objectives

Sam Blouir, Jimmy Smith, Antonios Anastasopoulos, Amarda Shehu


Abstract
Efficient state space models (SSMs), including linear recurrent neural networks and linear attention variants, have emerged as potential alternative language models to Transformers. While efficient, SSMs struggle with tasks requiring in-context retrieval, such as text copying and associative recall, limiting their usefulness in practical settings. Prior work on how to meet this challenge has focused on the internal model architecture and not investigated the role of the training procedure. This paper proposes a new training procedure that improve the performance of SSMs on retrieval-intensive tasks. This novel pre-training procedure combines a bidirectional processing of the input with dynamic mixtures of pre-training objectives to improve the utilization of the SSM’s fixed-size state. Our experimental evaluations show that this procedure significantly improves performance on retrieval-intensive tasks that challenge current SSMs, such as phone book lookup, long paragraph question-answering, and infilling tasks. Our findings offer insights into a new direction to advance the training of SSMs to close the performance gap with Transformers.
Anthology ID:
2024.emnlp-main.541
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9679–9705
Language:
URL:
https://aclanthology.org/2024.emnlp-main.541
DOI:
Bibkey:
Cite (ACL):
Sam Blouir, Jimmy Smith, Antonios Anastasopoulos, and Amarda Shehu. 2024. Birdie: Advancing State Space Language Modeling with Dynamic Mixtures of Training Objectives. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9679–9705, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Birdie: Advancing State Space Language Modeling with Dynamic Mixtures of Training Objectives (Blouir et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.541.pdf