Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Hang Zhang author Xin Li author Lidong Bing author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Yansong Feng editor Els Lefever editor Association for Computational Linguistics Singapore conference publication zhang-etal-2023-video 10.18653/v1/2023.emnlp-demo.49 https://aclanthology.org/2023.emnlp-demo.49/ 2023-12 543 553