Testing English News Articles for Lexical Homogenization Due to Widespread Use of Large Language Models

Sarah Fitterer; Dominik Gangl; Jannes Ulbrich

doi:10.18653/v1/2025.acl-srw.95

Testing English News Articles for Lexical Homogenization Due to Widespread Use of Large Language Models

Sarah Fitterer, Dominik Gangl, Jannes Ulbrich

Abstract

It is widely assumed that Large Language Models (LLMs) are shaping language, with multiple studies noting the growing presence of LLM-generated content and suggesting homogenizing effects. However, it remains unclear if these effects are already evident in recent writing. This study addresses that gap by comparing two datasets of English online news articles – one from 2018, prior to LLM popularization, and one from 2024, after widespread LLM adoption. We define lexical homogenization as a decrease in lexical diversity, measured by the MATTR, Maas, and MTLD metrics, and introduce the LLM-Style-Word Ratio (SWR) to measure LLM influence. We found higher MTLD and SWR scores, yet negligible changes in Maas and MATTR scores in 2024 corpus. We conclude that while there is an apparent influence of LLMs on written online English, homogenization effects do not show in the measurements. We therefore propose to apply different metrics to measure lexical homogenization in future studies on the influence of LLM usage on language change.

Anthology ID:: 2025.acl-srw.95
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Jin Zhao, Mingyang Wang, Zhu Liu
Venues:: ACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1239–1245
Language:
URL:: https://aclanthology.org/2025.acl-srw.95/
DOI:: 10.18653/v1/2025.acl-srw.95
Bibkey:
Cite (ACL):: Sarah Fitterer, Dominik Gangl, and Jannes Ulbrich. 2025. Testing English News Articles for Lexical Homogenization Due to Widespread Use of Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 1239–1245, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Testing English News Articles for Lexical Homogenization Due to Widespread Use of Large Language Models (Fitterer et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-srw.95.pdf

PDF Cite Search Fix data