CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Rachneet Sachdeva, Martin Tutek, Iryna Gurevych


Abstract
In recent years, large language models (LLMs) have shown remarkable capabilities at scale, particularly at generating text conditioned on a prompt. In our work, we investigate the use of LLMs to augment training data of smaller language models (SLMs) with automatically generated counterfactual (CF) instances – i.e. minimally altered inputs – in order to improve out-of-domain (OOD) performance of SLMs in the extractive question answering (QA) setup. We show that, across various LLM generators, such data augmentation consistently enhances OOD performance and improves model calibration for both confidence-based and rationale-augmented calibrator models. Furthermore, these performance improvements correlate with higher diversity of CF instances in terms of their surface form and semantic content. Finally, we show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance, indicating that rationale-augmented calibrators prefer concise explanations.
Anthology ID:
2024.eacl-long.113
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1876–1898
Language:
URL:
https://aclanthology.org/2024.eacl-long.113
DOI:
Bibkey:
Cite (ACL):
Rachneet Sachdeva, Martin Tutek, and Iryna Gurevych. 2024. CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1876–1898, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration (Sachdeva et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-long.113.pdf
Software:
 2024.eacl-long.113.software.zip
Video:
 https://aclanthology.org/2024.eacl-long.113.mp4