Polyglot Semantic Parsing in APIs

Kyle Richardson, Jonathan Berant, Jonas Kuhn


Abstract
Traditional approaches to semantic parsing (SP) work by training individual models for each available parallel dataset of text-meaning pairs. In this paper, we explore the idea of polyglot semantic translation, or learning semantic parsing models that are trained on multiple datasets and natural languages. In particular, we focus on translating text to code signature representations using the software component datasets of Richardson and Kuhn (2017b,a). The advantage of such models is that they can be used for parsing a wide variety of input natural languages and output programming languages, or mixed input languages, using a single unified model. To facilitate modeling of this type, we develop a novel graph-based decoding framework that achieves state-of-the-art performance on the above datasets, and apply this method to two other benchmark SP tasks.
Anthology ID:
N18-1066
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
720–730
Language:
URL:
https://aclanthology.org/N18-1066
DOI:
10.18653/v1/N18-1066
Bibkey:
Cite (ACL):
Kyle Richardson, Jonathan Berant, and Jonas Kuhn. 2018. Polyglot Semantic Parsing in APIs. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 720–730, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Polyglot Semantic Parsing in APIs (Richardson et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1066.pdf
Note:
 N18-1066.Notes.pdf
Video:
 http://vimeo.com/276898099
Code
 yakazimir/Code-Datasets +  additional community code