Nightmare at test time: How punctuation prevents parsers from generalizing

Anders Søgaard, Miryam de Lhoneux, Isabelle Augenstein


Abstract
Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.
Anthology ID:
W18-5404
Volume:
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
Tal Linzen, Grzegorz Chrupała, Afra Alishahi
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25–29
Language:
URL:
https://aclanthology.org/W18-5404
DOI:
10.18653/v1/W18-5404
Bibkey:
Cite (ACL):
Anders Søgaard, Miryam de Lhoneux, and Isabelle Augenstein. 2018. Nightmare at test time: How punctuation prevents parsers from generalizing. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 25–29, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Nightmare at test time: How punctuation prevents parsers from generalizing (Søgaard et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5404.pdf