Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection

Mengyu Xiang; Tinghao Chen; Boxu Han; Qiudan Li; Shu Wu; Daniel Dajun Zeng

Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection

Mengyu Xiang, Tinghao Chen, Boxu Han, Qiudan Li, Shu Wu, Daniel Dajun Zeng

Abstract

As social media grows, harmful information spreads rapidly across platforms and evolves over time, showing cross-platform and cross-temporal variations. Existing methods rely on fixed model parameters during training, which fail to handle substantial semantic discrepancies, leading to Out-Of-Distribution (OOD) problems. While test-time tuning enables dynamic parameter adjustment, it may lead to excessive adaptation to individual samples. The key challenge is how to adapt to semantic variations during testing while preventing overfitting from continuous tuning. To tackle this issue, this paper proposes RLAT, a reinforcement learning (RL)–guided adaptive tuning method for harmful text detection. First, a tuning joint optimization module is designed to update parameters and adapt to semantic variations during testing. It tunes the model by optimizing consistency loss and applying word-level attention constraints to reduce over-reliance on local words and learn a more robust global representation. Then, to mitigate overfitting caused by continuous tuning, a RL–guided adaptive decision model is introduced to direct the tuning process. It reduces the influence of local samples by selecting data and controlling parameter updates, thereby improving overall test performance. Experimental results show that the RLAT outperforms state-of-the-art baselines in cross-platform and cross-temporal scenarios across multiple public datasets.

Anthology ID:: 2026.acl-long.1623
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35158–35174
Language:
URL:: https://aclanthology.org/2026.acl-long.1623/
DOI:
Bibkey:
Cite (ACL):: Mengyu Xiang, Tinghao Chen, Boxu Han, Qiudan Li, Shu Wu, and Daniel Dajun Zeng. 2026. Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 35158–35174, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection (Xiang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1623.pdf
Checklist:: 2026.acl-long.1623.checklist.pdf

PDF Cite Search Checklist Fix data