利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering)

Gechao Wang (王屹超), Muhua Zhu (朱慕华), Chen Xu (许晨), Yan Zhang (张琰), Huizhen Wang (王会珍), Jingbo Zhu (朱靖波)


Abstract
视觉问答作为多模态任务,需要深度理解图像和文本问题从而推理出答案。然而在许多情况下,仅在图像和问题上进行简单推理难以得到正确的答案,事实上还有其它有效的信息可以被利用,例如图像描述、外部知识等。针对以上问题,本文提出了利用图像描述和外部知识增强表示的视觉问答模型。该模型以问题为导向,基于协同注意力机制分别在图像和其描述上进行编码,并且利用知识图谱嵌入,将外部知识编码到模型当中,丰富了模型的特征表示,增强模型的推理能力。在OKVQA数据集上的实验结果表明本文方法相比基线系统有1.71%的准确率提升,与先前工作中的主流模型相比也有1.88%的准确率提升,证明了本文方法的有效性。
Anthology ID:
2021.ccl-1.30
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Editors:
Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
316–326
Language:
Chinese
URL:
https://aclanthology.org/2021.ccl-1.30
DOI:
Bibkey:
Cite (ACL):
Gechao Wang, Muhua Zhu, Chen Xu, Yan Zhang, Huizhen Wang, and Jingbo Zhu. 2021. 利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 316–326, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering) (Wang et al., CCL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ccl-1.30.pdf
Data
OK-VQAVisual Question Answering