A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions Jack Hessel author Bo Pang author Zhenhai Zhu author Radu Soricut author 2019-11 text Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) Mohit Bansal editor Aline Villavicencio editor Association for Computational Linguistics Hong Kong, China conference publication hessel-etal-2019-case 10.18653/v1/K19-1039 https://aclanthology.org/K19-1039/ 2019-11 419 429