Black-Box Membership Inference Attacks for Video Training Data in Multimodal Large Language Models

Jinrui Wang; Zhenfeng Gao; Wendan Wang; Huili Wang; Zichen Qin; Linjie Zhu; Hongke Fu; Shangguang Wang; Tao Qi

Black-Box Membership Inference Attacks for Video Training Data in Multimodal Large Language Models

Jinrui Wang, Zhenfeng Gao, Wendan Wang, Huili Wang, Zichen Qin, Linjie Zhu, Hongke Fu, Shangguang Wang, Tao Qi

Abstract

The increasing use of video data in training multimodal large language models (MLLMs) raises significant concerns on privacy leakage and copyright violations, highlighting the need for detecting improperly used training videos through membership inference attacks (MIAs). Most existing video MIA methods assess model memorization of key semantic concepts within a video (e.g., the name of a well-known movie character). However, such concepts usually appear repeatedly throughout the training corpus, and memorization of them does not constitute reliable evidence that a specific video was used during training. Besides, while some methods mitigate this limitation by capturing relationships between frames, they require a model logit-accessible setting and are impractical in realistic black-box scenarios. To address these challenges, we propose a black-box MIA framework, named VideoMIA, that can provide reliable evidence of specific video data usage for training MLLMs. The key of our method is to leverage temporal dependencies across video frames to evaluate the model’s memorization of sequential dynamics within the video data, which cannot be inferred solely from general world knowledge or individual image data. The results across ten MLLMs and four benchmarks demonstrate that our method consistently achieves superior performance over all baselines in black-box evaluation settings. Code is available in https://github.com/jinruiwang258/VideoMIA.

Anthology ID:: 2026.acl-long.1820
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 39235–39248
Language:
URL:: https://aclanthology.org/2026.acl-long.1820/
DOI:
Bibkey:
Cite (ACL):: Jinrui Wang, Zhenfeng Gao, Wendan Wang, Huili Wang, Zichen Qin, Linjie Zhu, Hongke Fu, Shangguang Wang, and Tao Qi. 2026. Black-Box Membership Inference Attacks for Video Training Data in Multimodal Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 39235–39248, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Black-Box Membership Inference Attacks for Video Training Data in Multimodal Large Language Models (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1820.pdf
Checklist:: 2026.acl-long.1820.checklist.pdf

PDF Cite Search Checklist Fix data