Scene Graph Parsing via Abstract Meaning Representation in Pre-trained Language Models

Woo Suk Choi; Yu-Jung Heo; Dharani Punithan; Byoung-Tak Zhang

doi:10.18653/v1/2022.dlg4nlp-1.4

Scene Graph Parsing via Abstract Meaning Representation in Pre-trained Language Models

Woo Suk Choi, Yu-Jung Heo, Dharani Punithan, Byoung-Tak Zhang

Abstract

In this work, we propose the application of abstract meaning representation (AMR) based semantic parsing models to parse textual descriptions of a visual scene into scene graphs, which is the first work to the best of our knowledge. Previous works examined scene graph parsing from textual descriptions using dependency parsing and left the AMR parsing approach as future work since sophisticated methods are required to apply AMR. Hence, we use pre-trained AMR parsing models to parse the region descriptions of visual scenes (i.e. images) into AMR graphs and pre-trained language models (PLM), BART and T5, to parse AMR graphs into scene graphs. The experimental results show that our approach explicitly captures high-level semantics from textual descriptions of visual scenes, such as objects, attributes of objects, and relationships between objects. Our textual scene graph parsing approach outperforms the previous state-of-the-art results by 9.3% in the SPICE metric score.

Anthology ID:: 2022.dlg4nlp-1.4
Volume:: Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022)
Month:: July
Year:: 2022
Address:: Seattle, Washington
Editors:: Lingfei Wu, Bang Liu, Rada Mihalcea, Jian Pei, Yue Zhang, Yunyao Li
Venue:: DLG4NLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30–35
Language:
URL:: https://aclanthology.org/2022.dlg4nlp-1.4/
DOI:: 10.18653/v1/2022.dlg4nlp-1.4
Bibkey:
Cite (ACL):: Woo Suk Choi, Yu-Jung Heo, Dharani Punithan, and Byoung-Tak Zhang. 2022. Scene Graph Parsing via Abstract Meaning Representation in Pre-trained Language Models. In Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022), pages 30–35, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):: Scene Graph Parsing via Abstract Meaning Representation in Pre-trained Language Models (Choi et al., DLG4NLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.dlg4nlp-1.4.pdf
Data: MS COCO, Visual Genome

PDF Cite Search Fix data