Syntactic Data Augmentation Increases Robustness to Inference Heuristics

Junghyun Min; R. Thomas McCoy; Dipanjan Das; Emily Pitler; Tal Linzen

doi:10.18653/v1/2020.acl-main.212

Syntactic Data Augmentation Increases Robustness to Inference Heuristics

Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen

Abstract

Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model’s limitations, but rather by the paucity of crowdsourced NLI examples that might convey the importance of syntactic structure at the fine-tuning stage. We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus. The best-performing augmentation method, subject/object inversion, improved BERT’s accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set. This improvement generalized beyond the particular construction used for data augmentation, suggesting that augmentation causes BERT to recruit abstract syntactic representations.

Anthology ID:: 2020.acl-main.212
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2020
Address:: Online
Editors:: Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2339–2352
Language:
URL:: https://aclanthology.org/2020.acl-main.212/
DOI:: 10.18653/v1/2020.acl-main.212
Bibkey:
Cite (ACL):: Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, and Tal Linzen. 2020. Syntactic Data Augmentation Increases Robustness to Inference Heuristics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2339–2352, Online. Association for Computational Linguistics.
Cite (Informal):: Syntactic Data Augmentation Increases Robustness to Inference Heuristics (Min et al., ACL 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.acl-main.212.pdf
Video:: http://slideslive.com/38928832

PDF Cite Search Video Fix data