Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Siyue Zhang; Yilun Zhao; Liyuan Geng; Arman Cohan; Luu Anh Tuan; Chen Zhao

doi:10.18653/v1/2025.emnlp-main.213

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Siyue Zhang, Yilun Zhao, Liyuan Geng, Arman Cohan, Anh Tuan Luu, Chen Zhao

Abstract

Large language model (LLM)-based embedding models, benefiting from large scale pre-training and post-training, have begun to surpass BERT and T5-based models on general-purpose text embedding tasks such as document retrieval. However, a fundamental limitation of LLM embeddings lies in the unidirectional attention used during autoregressive pre-training, which misaligns with the bidirectional nature of text embedding tasks. To this end, We propose adopting diffusion language models for text embeddings, motivated by their inherent bidirectional architecture and recent success in matching or surpassing LLMs especially on reasoning tasks. We present the first systematic study of the diffusion language embedding model, which outperforms the LLM-based embedding model by 20% on long-document retrieval, 8% on reasoning-intensive retrieval, 2% on instruction-following retrieval, and achieve competitive performance on traditional text embedding benchmarks. Our analysis verifies that bidirectional attention is crucial for encoding global context in long and complex text.

Anthology ID:: 2025.emnlp-main.213
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4273–4303
Language:
URL:: https://aclanthology.org/2025.emnlp-main.213/
DOI:: 10.18653/v1/2025.emnlp-main.213
Bibkey:
Cite (ACL):: Siyue Zhang, Yilun Zhao, Liyuan Geng, Arman Cohan, Anh Tuan Luu, and Chen Zhao. 2025. Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4273–4303, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.213.pdf
Checklist:: 2025.emnlp-main.213.checklist.pdf

PDF Cite Search Checklist Fix data