CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation

Jungjun Kim, Hanbin Ko, Jialin Wu


Abstract
Daily scenes are complex in the real world due to occlusion, undesired lighting conditions, etc. Although humans handle those complicated environments well, they evoke challenges for machine learning systems to identify and describe the target without ambiguity. Most previous research focuses on mining discriminating features within the same category for the target object. One the other hand, as the scene becomes more complicated, human frequently uses the neighbor objects as complementary information to describe the target one. Motivated by that, we propose a novel Complementary Neighboring-based Attention Network (CoNAN) that explicitly utilizes the visual differences between the target object and its highly-related neighbors. These highly-related neighbors are determined by an attentional ranking module, as complementary features, highlighting the discriminating aspects for the target object. The speaker module then takes the visual difference features as an additional input to generate the expression. Our qualitative and quantitative results on the dataset RefCOCO, RefCOCO+, and RefCOCOg demonstrate that our generated expressions outperform other state-of-the-art models by a clear margin.
Anthology ID:
2020.coling-main.177
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1952–1962
Language:
URL:
https://aclanthology.org/2020.coling-main.177
DOI:
10.18653/v1/2020.coling-main.177
Bibkey:
Cite (ACL):
Jungjun Kim, Hanbin Ko, and Jialin Wu. 2020. CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1952–1962, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation (Kim et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.177.pdf
Data
MS COCORefCOCOVisual Genome