Semformer: Transformer Language Models with Semantic Planning

Yongjing Yin; Junran Ding; Kai Song; Yue Zhang

Semformer: Transformer Language Models with Semantic Planning

Yongjing Yin, Junran Ding, Kai Song, Yue Zhang

Abstract

Next-token prediction serves as the dominant component in current neural language models.During the training phase, the model employs teacher forcing, which predicts tokens based on all preceding ground truth tokens.However, this approach has been found to create shortcuts, utilizing the revealed prefix to spuriously fit future tokens, potentially compromising the accuracy of the next-token predictor.In this paper, we introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response.Specifically, we incorporate a sequence of planning tokens into the prefix, guiding the planning token representations to predict the latent semantic representations of the response, which are induced by an autoencoder.In a minimal planning task (i.e., graph path-finding), our model exhibits near-perfect performance and effectively mitigates shortcut learning, a feat that standard training methods and baseline models have been unable to accomplish.Furthermore, we pretrain Semformer from scratch with 125M parameters, demonstrating its efficacy through measures of perplexity, in-context learning, and fine-tuning on summarization tasks.

Anthology ID:: 2024.emnlp-main.1039
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18669–18680
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1039
DOI:
Bibkey:
Cite (ACL):: Yongjing Yin, Junran Ding, Kai Song, and Yue Zhang. 2024. Semformer: Transformer Language Models with Semantic Planning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18669–18680, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Semformer: Transformer Language Models with Semantic Planning (Yin et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1039.pdf

PDF Cite Search