Personalized Video Comment Generation

Xudong Lin; Ali Zare; Shiyuan Huang; Ming-Hsuan Yang; Shih-Fu Chang; Li Zhang

doi:10.18653/v1/2024.findings-emnlp.979

Personalized Video Comment Generation

Xudong Lin, Ali Zare, Shiyuan Huang, Ming-Hsuan Yang, Shih-Fu Chang, Li Zhang

Abstract

Generating personalized responses, particularly in the context of video, poses a unique challenge for language models. This paper introduces the novel task of Personalized Video Comment Generation (PVCG), aiming to predict user comments tailored to both the input video and the user’s comment history, where the user is unseen during the model training process. Unlike existing video captioning tasks that ignores the personalization in the text generation process, we introduce PerVidCom, a new dataset specifically collected for this novel task with diverse personalized comments from YouTube. Recognizing the limitations of existing captioning metrics for evaluating this task, we propose a new automatic metric based on Large Language Models (LLMs) with few-shot in-context learning, named FICL-Score, specifically measuring quality from the aspects of emotion, language style and content relevance. We verify the proposed metric with human evaluations. We establish baselines using prominent Multimodal LLMs (MLLMs), analyze their performance discrepancies through extensive evaluation, and identifies directions for future improvement on this important task. Our research opens up a new direction of personalizing MLLMs and paves the way for future research.

Anthology ID:: 2024.findings-emnlp.979
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16806–16820
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.979/
DOI:: 10.18653/v1/2024.findings-emnlp.979
Bibkey:
Cite (ACL):: Xudong Lin, Ali Zare, Shiyuan Huang, Ming-Hsuan Yang, Shih-Fu Chang, and Li Zhang. 2024. Personalized Video Comment Generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16806–16820, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Personalized Video Comment Generation (Lin et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.979.pdf

PDF Cite Search Fix data