Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets

Bryan E. Tuck, Rakesh Verma


Abstract
The rapid development of large language models (LLMs) has significantly improved the generation of fluent and convincing text, raising concerns about their potential misuse on social media platforms. We present a comprehensive methodology for creating nine Twitter datasets to examine the generative capabilities of four prominent LLMs: Llama 3, Mistral, Qwen2, and GPT4o. These datasets encompass four censored and five uncensored model configurations, including 7B and 8B parameter base-instruction models of the three open-source LLMs. Additionally, we perform a data quality analysis to assess the characteristics of textual outputs from human, “censored,” and “uncensored models,” employing semantic meaning, lexical richness, structural patterns, content characteristics, and detector performance metrics to identify differences and similarities. Our evaluation demonstrates that “uncensored” models significantly undermine the effectiveness of automated detection methods. This study addresses a critical gap by exploring smaller open-source models and the ramifications of “uncensoring,” providing valuable insights into how domain adaptation and content moderation strategies influence both the detectability and structural characteristics of machine-generated text.
Anthology ID:
2025.coling-main.607
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9044–9061
Language:
URL:
https://aclanthology.org/2025.coling-main.607/
DOI:
Bibkey:
Cite (ACL):
Bryan E. Tuck and Rakesh Verma. 2025. Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9044–9061, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets (Tuck & Verma, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.607.pdf