Long Text Generation with Topic-aware Discrete Latent Variable Model

Erguang Yang, Mingtong Liu, Deyi Xiong, Yujie Zhang, Yufeng Chen, Jinan Xu


Abstract
Generating coherent long texts is an important yet challenging task, particularly forthe open-ended generation. Prior work based on discrete latent codes focuses on the modeling of discourse relation, resulting in discrete codes only learning shallow semantics (Ji and Huang, 2021). A natural text always revolves around several related topics and the transition across them is natural and smooth. In this work, we investigate whether discrete latent codes can learn information of topics. To this end, we build a topic-aware latent code-guided text generation model. To encourage discrete codes to model information about topics, we propose a span-level bag-of-words training objective for the model. Automatic and manual evaluation experiments show that our method can generate more topic-relevant and coherent texts.
Anthology ID:
2022.emnlp-main.554
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8100–8107
Language:
URL:
https://aclanthology.org/2022.emnlp-main.554
DOI:
10.18653/v1/2022.emnlp-main.554
Bibkey:
Cite (ACL):
Erguang Yang, Mingtong Liu, Deyi Xiong, Yujie Zhang, Yufeng Chen, and Jinan Xu. 2022. Long Text Generation with Topic-aware Discrete Latent Variable Model. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8100–8107, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Long Text Generation with Topic-aware Discrete Latent Variable Model (Yang et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.554.pdf