SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects

Anja Belz, Adrian Muscat, Pierre Anguill, Mouhamadou Sow, Gaétan Vincent, Yassine Zinessabah


Abstract
We present SpatialVOC2K, the first multilingual image dataset with spatial relation annotations and object features for image-to-text generation, built using 2,026 images from the PASCAL VOC2008 dataset. The dataset incorporates (i) the labelled object bounding boxes from VOC2008, (ii) geometrical, language and depth features for each object, and (iii) for each pair of objects in both orders, (a) the single best preposition and (b) the set of possible prepositions in the given language that describe the spatial relationship between the two objects. Compared to previous versions of the dataset, we have roughly doubled the size for French, and completely reannotated as well as increased the size of the English portion, providing single best prepositions for English for the first time. Furthermore, we have added explicit 3D depth features for objects. We are releasing our dataset for free reuse, along with evaluation tools to enable comparative evaluation.
Anthology ID:
W18-6516
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–145
Language:
URL:
https://aclanthology.org/W18-6516
DOI:
10.18653/v1/W18-6516
Bibkey:
Cite (ACL):
Anja Belz, Adrian Muscat, Pierre Anguill, Mouhamadou Sow, Gaétan Vincent, and Yassine Zinessabah. 2018. SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects. In Proceedings of the 11th International Conference on Natural Language Generation, pages 140–145, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects (Belz et al., INLG 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6516.pdf
Code
 muskata/SpatialVOC2K
Data
SpatialVOC2KFlickr30kVRDVisual Genome