Detecting Mode Collapse in Language Models via Narration

Sil Hamilton


Abstract
No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author—what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of “mode collapse” whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.
Anthology ID:
2024.scalellm-1.5
Volume:
Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Antonio Valerio Miceli-Barone, Fazl Barez, Shay Cohen, Elena Voita, Ulrich Germann, Michal Lukasik
Venues:
SCALE-LLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
65–72
Language:
URL:
https://aclanthology.org/2024.scalellm-1.5
DOI:
Bibkey:
Cite (ACL):
Sil Hamilton. 2024. Detecting Mode Collapse in Language Models via Narration. In Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024), pages 65–72, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Detecting Mode Collapse in Language Models via Narration (Hamilton, SCALE-LLM-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.scalellm-1.5.pdf