Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection

Joe Stacey; Lisa Alazraki; Aran Ubhi; Beyza Ermis; Aaron Mueller; Marek Rei

Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection

Joe Stacey, Lisa Alazraki, Aran Ubhi, Beyza Ermis, Aaron Mueller, Marek Rei

Abstract

We investigate the robustness of fine-tuned Large Language Models (LLMs) for the task of Natural Language Inference (NLI), finding that the in-distribution gains from fine-tuning correspond to a large drop in out-of-distribution (OOD) performance. Despite the widespread use of closed-source LLMs, there are no robustness mitigation methods that work under their API fine-tuning constraints. Existing methods to improve robustness typically require changing the fine-tuning process or large-scale data augmentation, methods that are infeasible or cost prohibitive for closed-source models. To address this, we propose strategically selecting the NLI fine-tuning data, prioritising more complex examples or replacing existing training examples with LLM-generated data. Prioritising more complex training examples improves performance on challenging OOD NLI datasets, while training with synthetic data leads to substantial improvements on easier OOD datasets. We find that synthetic examples are often too simple, and by prompting LLMs to create more complex synthetic data we can improve performance on both easy and challenging OOD datasets. Finally, we show that recent autoregressive LLMs are substantially more robust to distributional shifts compared to encoder models, and should be a preferred baseline for future research.

Anthology ID:: 2026.findings-eacl.286
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5378–5404
Language:
URL:: https://aclanthology.org/2026.findings-eacl.286/
DOI:
Bibkey:
Cite (ACL):: Joe Stacey, Lisa Alazraki, Aran Ubhi, Beyza Ermis, Aaron Mueller, and Marek Rei. 2026. Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5378–5404, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection (Stacey et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.286.pdf
Checklist:: 2026.findings-eacl.286.checklist.pdf

PDF Cite Search Checklist Fix data