Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation

Zixuan Wang; Jinghao Shi; Hanzhong Liang; Xiang Shen; Vera Wen; Zhiqian Chen; Yifan Wu; Zhixin Zhang; Hongyu Xiong

doi:10.18653/v1/2025.acl-industry.62

Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation

Zixuan Wang, Jinghao Shi, Hanzhong Liang, Xiang Shen, Vera Wen, Zhiqian Chen, Yifan Wu, Zhixin Zhang, Hongyu Xiong

Abstract

Effective content moderation is essential for video platforms to safeguard user experience and uphold community standards. While traditional video classification models effectively handle well-defined moderation tasks, they struggle with complicated scenarios such as implicit harmful content and contextual ambiguity. Multimodal large language models (MLLMs) offer a promising solution to these limitations with their superior cross-modal reasoning and contextual understanding. However, two key challenges hinder their industrial adoption. First, the high computational cost of MLLMs makes full-scale deployment impractical. Second, adapting generative models for discriminative classification remains an open research problem. In this paper, we first introduce an efficient method to transform a generative MLLM into a multimodal classifier using minimal discriminative training data. To enable industry-scale deployment, we then propose a router-ranking cascade system that integrates MLLMs with a lightweight router model. Offline experiments demonstrate that our MLLM-based approach improves F1 score by 66.50% over traditional classifiers while requiring only 2% of the fine-tuning data. Online evaluations show that our system increases automatic content moderation volume by 41%, while the cascading deployment reduces computational cost to only 1.5% of direct full-scale deployment.

Anthology ID:: 2025.acl-industry.62
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Georg Rehm, Yunyao Li
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 873–880
Language:
URL:: https://aclanthology.org/2025.acl-industry.62/
DOI:: 10.18653/v1/2025.acl-industry.62
Bibkey:
Cite (ACL):: Zixuan Wang, Jinghao Shi, Hanzhong Liang, Xiang Shen, Vera Wen, Zhiqian Chen, Yifan Wu, Zhixin Zhang, and Hongyu Xiong. 2025. Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 873–880, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation (Wang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-industry.62.pdf

PDF Cite Search Fix data