Coreference by Appearance: Visually Grounded Event Coreference Resolution

Liming Wang; Shengyu Feng; Xudong Lin; Manling Li; Heng Ji; Shih-Fu Chang

doi:10.18653/v1/2021.crac-1.14

Coreference by Appearance: Visually Grounded Event Coreference Resolution

Liming Wang, Shengyu Feng, Xudong Lin, Manling Li, Heng Ji, Shih-Fu Chang

Abstract

Event coreference resolution is critical to understand events in the growing number of online news with multiple modalities including text, video, speech, etc. However, the events and entities depicting in different modalities may not be perfectly aligned and can be difficult to annotate, which makes the task especially challenging with little supervision available. To address the above issues, we propose a supervised model based on attention mechanism and an unsupervised model based on statistical machine translation, capable of learning the relative importance of modalities for event coreference resolution. Experiments on a video multimedia event dataset show that our multimodal models outperform text-only systems in event coreference resolution tasks. A careful analysis reveals that the performance gain of the multimodal model especially under unsupervised settings comes from better learning of visually salient events.

Anthology ID:: 2021.crac-1.14
Volume:: Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Maciej Ogrodniczuk, Sameer Pradhan, Massimo Poesio, Yulia Grishina, Vincent Ng
Venue:: CRAC
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 132–140
Language:
URL:: https://aclanthology.org/2021.crac-1.14/
DOI:: 10.18653/v1/2021.crac-1.14
Bibkey:
Cite (ACL):: Liming Wang, Shengyu Feng, Xudong Lin, Manling Li, Heng Ji, and Shih-Fu Chang. 2021. Coreference by Appearance: Visually Grounded Event Coreference Resolution. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 132–140, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Coreference by Appearance: Visually Grounded Event Coreference Resolution (Wang et al., CRAC 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.crac-1.14.pdf
Video:: https://aclanthology.org/2021.crac-1.14.mp4

PDF Cite Search Video Fix data