Visually Guided Spatial Relation Extraction from Text

Taher Rahgooy, Umar Manzoor, Parisa Kordjamshidi


Abstract
Extraction of spatial relations from sentences with complex/nesting relationships is very challenging as often needs resolving inherent semantic ambiguities. We seek help from visual modality to fill the information gap in the text modality and resolve spatial semantic ambiguities. We use various recent vision and language datasets and techniques to train inter-modality alignment models, visual relationship classifiers and propose a novel global inference model to integrate these components into our structured output prediction model for spatial role and relation extraction. Our global inference model enables us to utilize the visual and geometric relationships between objects and improves the state-of-art results of spatial information extraction from text.
Anthology ID:
N18-2124
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
788–794
Language:
URL:
https://aclanthology.org/N18-2124
DOI:
10.18653/v1/N18-2124
Bibkey:
Cite (ACL):
Taher Rahgooy, Umar Manzoor, and Parisa Kordjamshidi. 2018. Visually Guided Spatial Relation Extraction from Text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 788–794, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Visually Guided Spatial Relation Extraction from Text (Rahgooy et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-2124.pdf
Data
Visual Genome