Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions

Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions Michele Cafagna author Kees van Deemter author Albert Gatt author 2022-12 text Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS) Wenjuan Han editor Zilong Zheng editor Zhouhan Lin editor Lifeng Jin editor Yikang Shen editor Yoon Kim editor Kewei Tu editor Association for Computational Linguistics Abu Dhabi, United Arab Emirates (Hybrid) conference publication cafagna-etal-2022-understanding 10.18653/v1/2022.umios-1.6 https://aclanthology.org/2022.umios-1.6/ 2022-12 56 72