Balaur: Language Model Pretraining with Lexical Semantic Relations

Andrei Mircea, Jackie Cheung


Abstract
Lexical semantic relations (LSRs) characterize meaning relationships between words and play an important role in systematic generalization on lexical inference tasks. Notably, several tasks that require knowledge of hypernymy still pose a challenge for pretrained language models (LMs) such as BERT, underscoring the need to better align their linguistic behavior with our knowledge of LSRs. In this paper, we propose Balaur, a model that addresses this challenge by modeling LSRs directly in the LM’s hidden states throughout pretraining. Motivating our approach is the hypothesis that the internal representations of LMs can provide an interface to their observable linguistic behavior, and that by controlling one we can influence the other. We validate our hypothesis and demonstrate that Balaur generally improves the performance of large transformer-based LMs on a comprehensive set of hypernymy-informed tasks, as well as on the original LM objective. Code and data are made available at https://github.com/mirandrom/balaur
Anthology ID:
2023.findings-emnlp.674
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10054–10070
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.674
DOI:
10.18653/v1/2023.findings-emnlp.674
Bibkey:
Cite (ACL):
Andrei Mircea and Jackie Cheung. 2023. Balaur: Language Model Pretraining with Lexical Semantic Relations. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10054–10070, Singapore. Association for Computational Linguistics.
Cite (Informal):
Balaur: Language Model Pretraining with Lexical Semantic Relations (Mircea & Cheung, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.674.pdf