Haotian Lu
2026
CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation
Bingzhe Wu | Haotian Lu | Yuchen Mou
Findings of the Association for Computational Linguistics: ACL 2026
Bingzhe Wu | Haotian Lu | Yuchen Mou
Findings of the Association for Computational Linguistics: ACL 2026
Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading "decision shortcuts" embedded in context. Inspired by cognitive psychology insights into expert moderation, we introduce CᴀʀO (Chain-of-Analogy Reasoning Optimization), a novel two-stage training framework to induce robust analogical reasoning in LLMs. First, CᴀʀO bootstraps analogical reasoning chains via retrieval-augmented generation (RAG) on moderation data and performs supervised fine-tuning (SFT). Second, we propose a customized direct preference optimization (DPO) approach to reinforce analogical reasoning behaviors explicitly. Unlike static retrieval methods, CᴀʀO dynamically generates tailored analogical references during inference, effectively mitigating harmful decision shortcuts. Extensive experiments demonstrate that CᴀʀO substantially outperforms state-of-the-art reasoning models (DeepSeek R1, QwQ), specialized moderation models (LLaMA Guard), and advanced fine-tuning and retrieval-augmented methods, achieving an average F1 score improvement of 24.9% on challenging ambiguous moderation benchmarks.
CHAIRO: Contextual Hierarchical Analogical Induction and Reasoning Optimization for LLMs
Haotian Lu | Yuchen Mou | Bingzhe Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haotian Lu | Yuchen Mou | Bingzhe Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Warning: This paper may contain content that could be disturbing or offensive. Content moderation in online platforms faces persistent challenges due to the evolving complexity of user-generated content and the limitations of traditional rule-based and machine learning approaches. While recent advances in large language models (LLMs) have enabled more sophisticated moderation via direct prompting or fine-tuning, these approaches often exhibit limited generalization, interpretability, and adaptability to unseen or ambiguous cases.In this work, we propose a novel moderation framework that leverages analogical examples to enhance rule induction and decision reliability. Our approach integrates end-to-end optimization of analogical retrieval, rule generation, and moderation classification, enabling the dynamic adaptation of moderation rules to diverse content scenarios. Through comprehensive experiments, we demonstrate that our method significantly outperforms both rule-injected fine-tuning baselines and multi-stage static RAG pipelines in terms of moderation accuracy and rule quality. Further evaluations—including human assessments and external model generalization tests confirm the superiority of rules generated by our framework in terms of clarity, interpretability, and applicability. These findings highlight the potential of analogical example-driven methods for advancing robust, explainable, and generalizable content moderation in real-world applications.