Extending Phrase Grounding with Pronouns in Visual Dialogues

Panzhong Lu; Xin Zhang; Meishan Zhang; Min Zhang

doi:10.18653/v1/2022.emnlp-main.518

Extending Phrase Grounding with Pronouns in Visual Dialogues

Panzhong Lu, Xin Zhang, Meishan Zhang, Min Zhang

Abstract

Conventional phrase grounding aims to localize noun phrases mentioned in a given caption to their corresponding image regions, which has achieved great success recently. Apparently, sole noun phrase grounding is not enough for cross-modal visual language understanding. Here we extend the task by considering pronouns as well. First, we construct a dataset of phrase grounding with both noun phrases and pronouns to image regions. Based on the dataset, we test the performance of phrase grounding by using a state-of-the-art literature model of this line. Then, we enhance the baseline grounding model with coreference information which should help our task potentially, modeling the coreference structures with graph convolutional networks. Experiments on our dataset, interestingly, show that pronouns are easier to ground than noun phrases, where the possible reason might be that these pronouns are much less ambiguous. Additionally, our final model with coreference information can significantly boost the grounding performance of both noun phrases and pronouns.

Anthology ID:: 2022.emnlp-main.518
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7614–7625
Language:
URL:: https://aclanthology.org/2022.emnlp-main.518
DOI:: 10.18653/v1/2022.emnlp-main.518
Bibkey:
Cite (ACL):: Panzhong Lu, Xin Zhang, Meishan Zhang, and Min Zhang. 2022. Extending Phrase Grounding with Pronouns in Visual Dialogues. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7614–7625, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Extending Phrase Grounding with Pronouns in Visual Dialogues (Lu et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.518.pdf

PDF Cite Search