Jing Ke

2026

To Paraphrase or Not: Efficient Comment Detoxification with Unsupervised Detoxifiability Discrimination
Jing Ke | Zheyong Xie | Shaosheng Cao | Tong Xu | Enhong Chen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Mitigating toxic content is critical for maintaining a healthy social platform, yet existing detoxification systems face significant limitations: overcorrection from uniformly processing all toxic comments, and parallel data scarcity in paraphrasing model training. To tackle these challenges, we propose Detoxifiability-Aware Detoxification (DID), a novel paradigm that adaptively conducts filtering or paraphrasing for each toxic comment based on its detoxifiability, namely whether it can be paraphrased into a benign comment in essence. Specifically, DID integrates three core modules: (1) an unsupervised detoxifiability discriminator, (2) a semantic purification module that extracts harmful intents and then performs targeted paraphrasing only on detoxifiable comments and (3) a feedback-adaptive refinement loop that processes remaining harmful contents only when they are detoxifiable. Experimental results demonstrate that DID significantly outperforms existing approaches on academic data and an industrial platform, establishing a novel and practical modeling paradigm for comment detoxification.

Co-authors

Venues

EACL1

Fix author