STAND-Guard: A Small Task-Adaptive Content Moderation Model

Minjia Wang; Pingping Lin; Siqi Cai; Shengnan An; Shengjie Ma; Zeqi Lin; Congrui Huang; Bixiong Xu

STAND-Guard: A Small Task-Adaptive Content Moderation Model

Minjia Wang, Pingping Lin, Siqi Cai, Shengnan An, Shengjie Ma, Zeqi Lin, Congrui Huang, Bixiong Xu

Abstract

Content moderation, the process of reviewing and monitoring the safety of generated content, is important for development of welcoming online platforms and responsible large language models. Content moderation contains various tasks, each with its unique requirements tailored to specific scenarios. Therefore, it is crucial to develop a model that can be easily adapted to novel or customized content moderation tasks accurately without extensive model tuning. This paper presents STAND-Guard, a Small Task-Adaptive coNtent moDeration model. The basic motivation is: by performing instruct tuning on various content moderation tasks, we can unleash the power of small language models (SLMs) on unseen (out-of-distribution) content moderation tasks. We also carefully study the effects of training tasks and model size on the efficacy of cross-task fine-tuning mechanism. Experiments demonstrate STAND-Guard is comparable to GPT-3.5-Turbo across over 40 public datasets, as well as proprietary datasets derived from real-world business scenarios. Remarkably, STAND-Guard achieved nearly equivalent results to GPT-4-Turbo on unseen English binary classification tasks.

Anthology ID:: 2025.coling-industry.1
Volume:: Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–20
Language:
URL:: https://aclanthology.org/2025.coling-industry.1/
DOI:
Bibkey:
Cite (ACL):: Minjia Wang, Pingping Lin, Siqi Cai, Shengnan An, Shengjie Ma, Zeqi Lin, Congrui Huang, and Bixiong Xu. 2025. STAND-Guard: A Small Task-Adaptive Content Moderation Model. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 1–20, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: STAND-Guard: A Small Task-Adaptive Content Moderation Model (Wang et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-industry.1.pdf

PDF Cite Search Fix data