BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

Minh Duc Chu; Kshitij Pawar; Zihao He; Roxanna Sharifi; Ross M. Sonnenblick; Magdalayna Curry; Laura DAdamo; Lindsay Young; Stuart Murray; Kristina Lerman

BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

Minh Duc Chu, Kshitij Pawar, Zihao He, Roxanna Sharifi, Ross M. Sonnenblick, Magdalayna Curry, Laura DAdamo, Lindsay Young, Stuart Murray, Kristina Lerman

Abstract

Social media platforms face escalating challenges in detecting harmful content that promotes muscle dysmorphic behaviors and cognitions (bigorexia). This content can evade moderation by camouflaging as legitimate fitness advice and disproportionately affects adolescent males. We address this challenge with BigTokDetect, a clinically informed framework for identifying pro-bigorexia content on TikTok. We introduce BigTok, the first expert-annotated multimodal benchmark dataset of over 2,200 TikTok videos labeled by clinical psychiatrists across five categories and eighteen fine-grained subcategories. Comprehensive evaluation of state-of-the-art vision-language models reveals that while commercial zero-shot models achieve the highest accuracy on broad primary categories, supervised fine-tuning enables smaller open-source models to perform better on fine-grained subcategory detection. Ablation studies show that multimodal fusion improves performance by 5 to 15 percent, with video features providing the most discriminative signals. These findings support a grounded moderation approach that automates detection of explicit harms while flagging ambiguous content for human review, and they establish a scalable framework for harm mitigation in emerging mental health domains.

Anthology ID:: 2026.eacl-long.33
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 766–790
Language:
URL:: https://aclanthology.org/2026.eacl-long.33/
DOI:
Bibkey:
Cite (ACL):: Minh Duc Chu, Kshitij Pawar, Zihao He, Roxanna Sharifi, Ross M. Sonnenblick, Magdalayna Curry, Laura DAdamo, Lindsay Young, Stuart Murray, and Kristina Lerman. 2026. BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 766–790, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok (Chu et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.33.pdf
Checklist:: 2026.eacl-long.33.checklist.pdf

PDF Cite Search Checklist Fix data