Audio-visual training for improved grounding in video-text LLMs Shivprasad Rajendra Sagare author Hemachandran S author Kinshuk Sarabhai author Prashant Ullegaddi author Rajeshkumar Sa author 2024-09 text Proceedings of the 17th International Natural Language Generation Conference Saad Mahamood editor Nguyen Le Minh editor Daphne Ippolito editor Association for Computational Linguistics Tokyo, Japan conference publication sagare-etal-2024-audio-visual 10.18653/v1/2024.inlg-main.36 https://aclanthology.org/2024.inlg-main.36/ 2024-09 440 445