An Analysis of Action Recognition Datasets for Language and Vision Tasks

Spandana Gella, Frank Keller


Abstract
A large amount of recent research has focused on tasks that combine language and vision, resulting in a proliferation of datasets and methods. One such task is action recognition, whose applications include image annotation, scene understanding and image retrieval. In this survey, we categorize the existing approaches based on how they conceptualize this problem and provide a detailed review of existing datasets, highlighting their diversity as well as advantages and disadvantages. We focus on recently developed datasets which link visual information with linguistic resources and provide a fine-grained syntactic and semantic analysis of actions in images.
Anthology ID:
P17-2011
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–71
Language:
URL:
https://aclanthology.org/P17-2011/
DOI:
10.18653/v1/P17-2011
Bibkey:
Cite (ACL):
Spandana Gella and Frank Keller. 2017. An Analysis of Action Recognition Datasets for Language and Vision Tasks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 64–71, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
An Analysis of Action Recognition Datasets for Language and Vision Tasks (Gella & Keller, ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-2011.pdf
Video:
 https://aclanthology.org/P17-2011.mp4
Data
FrameNetHICOMPII Human PoseVerseVisual GenomeVisual Question Answering