CVT5: Using Compressed Video Encoder and UMT5 for Dense Video Captioning Mohammad Javad Pirhadi author Motahhare Mirzaei author Sauleh Eetemadi author 2025-01 text Proceedings of the First Workshop of Evaluation of Multi-Modal Generation Wei Emma Zhang editor Xiang Dai editor Desmond Elliot editor Byron Fang editor Mongyuan Sim editor Haojie Zhuang editor Weitong Chen editor Association for Computational Linguistics Abu Dhabi, UAE conference publication pirhadi-etal-2025-cvt5 https://aclanthology.org/2025.evalmg-1.2/ 2025-01 10 23