Exploring Causal Mechanisms for Machine Text Detection Methods

Kiyoon Yoo, Wonhyuk Ahn, Yeji Song, Nojun Kwak


Abstract
The immense attraction towards text generation garnered by ChatGPT has spurred the need for discriminating machine-text from human text. In this work, we provide preliminary evidence that the scores computed by existing zero-shot and supervised machine-generated text detection methods are not solely determined by the generated texts, but are affected by prompts and real texts as well. Using techniques from causal inference, we show the existence of backdoor paths that confounds the relationships between text and its detection score and how the confounding bias can be partially mitigated. We open up new research directions in identifying other factors that may be interwoven in the detection of machine text. Our study calls for a deeper investigation into which kinds of prompts make the detection of machine text more difficult or easier
Anthology ID:
2024.trustnlp-1.7
Volume:
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Anaelia Ovalle, Kai-Wei Chang, Yang Trista Cao, Ninareh Mehrabi, Jieyu Zhao, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–78
Language:
URL:
https://aclanthology.org/2024.trustnlp-1.7
DOI:
10.18653/v1/2024.trustnlp-1.7
Bibkey:
Cite (ACL):
Kiyoon Yoo, Wonhyuk Ahn, Yeji Song, and Nojun Kwak. 2024. Exploring Causal Mechanisms for Machine Text Detection Methods. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024), pages 71–78, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Exploring Causal Mechanisms for Machine Text Detection Methods (Yoo et al., TrustNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.trustnlp-1.7.pdf