Yeji Song
2024
Exploring Causal Mechanisms for Machine Text Detection Methods
Kiyoon Yoo
|
Wonhyuk Ahn
|
Yeji Song
|
Nojun Kwak
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
The immense attraction towards text generation garnered by ChatGPT has spurred the need for discriminating machine-text from human text. In this work, we provide preliminary evidence that the scores computed by existing zero-shot and supervised machine-generated text detection methods are not solely determined by the generated texts, but are affected by prompts and real texts as well. Using techniques from causal inference, we show the existence of backdoor paths that confounds the relationships between text and its detection score and how the confounding bias can be partially mitigated. We open up new research directions in identifying other factors that may be interwoven in the detection of machine text. Our study calls for a deeper investigation into which kinds of prompts make the detection of machine text more difficult or easier