The Price of Format: Diversity Collapse in LLMs

Longfei Yun; Chenyang An; Zilong Wang; Letian Peng; Jingbo Shang

doi:10.18653/v1/2025.findings-emnlp.836

The Price of Format: Diversity Collapse in LLMs

Longfei Yun, Chenyang An, Zilong Wang, Letian Peng, Jingbo Shang

Abstract

Instruction-tuned large language models (LLMs) employ structured templates, such as role markers and special tokens, to enforce format consistency during inference. However, we identify a critical limitation of such formatting: it induces a phenomenon we term diversity collapse, where the model generates semantically similar outputs for open-ended inputs, undermining creativity and variability. We systematically evaluate this effect across tasks like story completion and free-form generation, finding that (1) diversity collapse persists even under high-temperature sampling, and (2) structural tokens in templates significantly constrain the model’s output space. To contextualize these findings, we fine-tune using a range of structured prompts and then evaluate them across three axes: downstream task performance, alignment behavior, and output diversity. Our analysis shows that format consistency between fine-tuning and inference is crucial for structure-sensitive tasks (e.g., GSM8K, IFEval), but has marginal influence on knowledge-heavy tasks (e.g., MMLU, WebQuestions). In contrast, output diversity is primarily governed by the presence or absence of structural tokens, with minimal formatting yielding the most diverse outputs. These findings reveal that current prompting conventions, while beneficial for alignment, may inadvertently suppress output diversity, underscoring the need for diversity-aware prompt design and instruction tuning.

Anthology ID:: 2025.findings-emnlp.836
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15454–15468
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.836/
DOI:: 10.18653/v1/2025.findings-emnlp.836
Bibkey:
Cite (ACL):: Longfei Yun, Chenyang An, Zilong Wang, Letian Peng, and Jingbo Shang. 2025. The Price of Format: Diversity Collapse in LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15454–15468, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: The Price of Format: Diversity Collapse in LLMs (Yun et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.836.pdf
Checklist:: 2025.findings-emnlp.836.checklist.pdf

PDF Cite Search Checklist Fix data