VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks Caren Han author Siqu Long author Siwen Luo author Kunze Wang author Josiah Poon author 2020-12 text Proceedings of the 28th International Conference on Computational Linguistics Donia Scott editor Nuria Bel editor Chengqing Zong editor International Committee on Computational Linguistics Barcelona, Spain (Online) conference publication han-etal-2020-victr 10.18653/v1/2020.coling-main.277 https://aclanthology.org/2020.coling-main.277/ 2020-12 3107 3117