Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization

Michail Mersinias; Panagiotis Valvis

Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization

Abstract

In recent years, natural language inference has been an emerging research area. In this paper, we present a novel data augmentation technique and combine it with a unique learning procedure for that task. Our so-called automatic contextual data augmentation (acda) method manages to be fully automatic, non-trivially contextual, and computationally efficient at the same time. When compared to established data augmentation methods, it is substantially more computationally efficient and requires no manual annotation by a human expert as they usually do. In order to increase its efficiency, we combine acda with two learning optimization techniques: contrastive learning and a hybrid loss function. The former maximizes the benefit of the supervisory signal generated by acda, while the latter incentivises the model to learn the nuances of the decision boundary. Our combined approach is shown experimentally to provide an effective way for mitigating spurious data correlations within a dataset, called dataset artifacts, and as a result improves performance. Specifically, our experiments verify that acda-boosted pre-trained language models that employ our learning optimization techniques, consistently outperform the respective fine-tuned baseline pre-trained language models across both benchmark datasets and adversarial examples.

Anthology ID:: 2022.lrec-1.45
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 427–435
Language:
URL:: https://aclanthology.org/2022.lrec-1.45/
DOI:
Bibkey:
Cite (ACL):: Michail Mersinias and Panagiotis Valvis. 2022. Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 427–435, Marseille, France. European Language Resources Association.
Cite (Informal):: Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization (Mersinias & Valvis, LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.45.pdf

PDF Cite Search Fix data