Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Bin Lin author Yang Ye author Bin Zhu author Jiaxi Cui author Munan Ning author Peng Jin author Li Yuan author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication lin-etal-2024-video 10.18653/v1/2024.emnlp-main.342 https://aclanthology.org/2024.emnlp-main.342/ 2024-11 5971 5984