Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference

Grace Proebsting, Adam Poliak


Abstract
We test whether NLP datasets created with Large Language Models (LLMs) contain annotation artifacts and social biases like NLP datasets elicited from crowd-source workers. We recreate a portion of the Stanford Natural Language Inference corpus using GPT-4, Llama-2 70b for Chat, and Mistral 7b Instruct. We train hypothesis-only classifiers to determine whether LLM-elicited NLI datasets contain annotation artifacts. Next, we use point-wise mutual information to identify the words in each dataset that are associated with gender, race, and age-related terms. On our LLM-generated NLI datasets, fine-tuned BERT hypothesis-only classifiers achieve between 86-96% accuracy. Our analyses further characterize the annotation artifacts and stereotypical biases in LLM-generated datasets.
Anthology ID:
2025.coling-main.389
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5836–5851
Language:
URL:
https://aclanthology.org/2025.coling-main.389/
DOI:
Bibkey:
Cite (ACL):
Grace Proebsting and Adam Poliak. 2025. Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5836–5851, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference (Proebsting & Poliak, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.389.pdf