Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

Yikang Shen; Shawn Tan; Alessandro Sordoni; Siva Reddy; Aaron Courville

doi:10.18653/v1/2021.naacl-main.132

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville

Abstract

Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters than other models.

Anthology ID:: 2021.naacl-main.132
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1660–1672
Language:
URL:: https://aclanthology.org/2021.naacl-main.132/
DOI:: 10.18653/v1/2021.naacl-main.132
Bibkey:
Cite (ACL):: Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, and Aaron Courville. 2021. Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1660–1672, Online. Association for Computational Linguistics.
Cite (Informal):: Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle (Shen et al., NAACL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.naacl-main.132.pdf
Optionalsupplementarycode:: 2021.naacl-main.132.OptionalSupplementaryCode.zip
Video:: https://aclanthology.org/2021.naacl-main.132.mp4

PDF Cite Search Optionalsupplementarycode Video Fix data