Acquiring Bidirectionality via Large and Small Language Models

Takumi Goto, Hiroyoshi Nagao, Yuta Koreeda


Abstract
Using token representation from bidirectional language models (LMs) such as BERT is still a widely used approach for token-classification tasks. Even though there exist much larger unidirectional LMs such as Llama-2, they are rarely used to replace the token representation of bidirectional LMs. In this work, we hypothesize that their lack of bidirectionality is what is keeping unidirectional LMs behind. To that end, we propose to newly train a small backward LM and concatenate its representations to those of an existing LM for downstream tasks. Through experiments in token-classification tasks, we demonstrate that introducing backward model can improve the benchmark performance by more than 10 points. Furthermore, we show that the proposed method is especially effective for rare domains and in few-shot learning settings.
Anthology ID:
2025.coling-main.116
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1711–1717
Language:
URL:
https://aclanthology.org/2025.coling-main.116/
DOI:
Bibkey:
Cite (ACL):
Takumi Goto, Hiroyoshi Nagao, and Yuta Koreeda. 2025. Acquiring Bidirectionality via Large and Small Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 1711–1717, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Acquiring Bidirectionality via Large and Small Language Models (Goto et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.116.pdf