Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation

Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, Songfang Huang


Abstract
The diffusion model, a new generative modeling paradigm, has achieved great success in image, audio, and video generation.However, considering the discrete categorical nature of the text, it is not trivial to extend continuous diffusion models to natural language. In this work, we propose SeqDiffuSeq, a text diffusion model, to approach sequence-to-sequence text generation with an encoder-decoder Transformer architecture.To improve the generation performance, SeqDiffuSeq is equipped with the self-conditioning technique and our newly proposed adaptive noise schedule technique. Self-conditioning enables SeqDiffuSeq to better use the predicted sequence information during the generation process.The adaptive noise schedule balances the difficulty of denoising across time steps at the token level.Experiment results illustrate the improved performance on five sequence-to-sequence generation tasks compared to other diffusion-based models regarding text quality and inference time.
Anthology ID:
2024.naacl-long.2
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–39
Language:
URL:
https://aclanthology.org/2024.naacl-long.2
DOI:
10.18653/v1/2024.naacl-long.2
Bibkey:
Cite (ACL):
Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, and Songfang Huang. 2024. Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 22–39, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation (Yuan et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.2.pdf