The Rise of AI-Generated Content in Wikipedia

Creston Brooks; Samuel Eggert; Denis Peskoff

doi:10.18653/v1/2024.wikinlp-1.12

The Rise of AI-Generated Content in Wikipedia

Creston Brooks, Samuel Eggert, Denis Peskoff

Abstract

The rise of AI-generated content in popular information sources raises significant concerns about accountability, accuracy, and bias amplification. Beyond directly impacting consumers, the widespread presence of this content poses questions for the long-term viability of training language models on vast internet sweeps. We use GPTZero, a proprietary AI detector, and Binoculars, an open-source alternative, to establish lower bounds on the presence of AI-generated content in recently created Wikipedia pages. Both detectors reveal a marked increase in AI-generated content in recent pages compared to those from before the release of GPT-3.5. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated, with lower percentages for German, French, and Italian articles. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint on controversial topics.

Anthology ID:: 2024.wikinlp-1.12
Volume:: Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Lucie Lucie-Aimée, Angela Fan, Tajuddeen Gwadabe, Isaac Johnson, Fabio Petroni, Daniel van Strien
Venues:: WikiNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 67–79
Language:
URL:: https://aclanthology.org/2024.wikinlp-1.12/
DOI:: 10.18653/v1/2024.wikinlp-1.12
Bibkey:
Cite (ACL):: Creston Brooks, Samuel Eggert, and Denis Peskoff. 2024. The Rise of AI-Generated Content in Wikipedia. In Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia, pages 67–79, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: The Rise of AI-Generated Content in Wikipedia (Brooks et al., WikiNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wikinlp-1.12.pdf

PDF Cite Search Fix data