Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui author Dong Huk Park author Daylen Yang author Anna Rohrbach author Trevor Darrell author Marcus Rohrbach author 2016-11 text Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing Jian Su editor Kevin Duh editor Xavier Carreras editor Association for Computational Linguistics Austin, Texas conference publication fukui-etal-2016-multimodal 10.18653/v1/D16-1044 https://aclanthology.org/D16-1044/ 2016-11 457 468