Reliability Check: An Analysis of GPT-3’s Response to Sensitive Topics and Prompt Wording

Aisha Khatun, Daniel Brown


Abstract
Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF), but no systematic analysis of the responses of these models to different categories of statements, or on their potential vulnerabilities to simple prompting changes is available. In this work, we analyze what confuses GPT-3: how the model responds to certain sensitive topics and what effects the prompt wording has on the model response. We find that GPT-3 correctly disagrees with obvious Conspiracies and Stereotypes but makes mistakes with common Misconceptions and Controversies. The model responses are inconsistent across prompts and settings, highlighting GPT-3’s unreliability.
Anthology ID:
2023.trustnlp-1.8
Volume:
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, Rahul Gupta
Venue:
TrustNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
73–95
Language:
URL:
https://aclanthology.org/2023.trustnlp-1.8
DOI:
10.18653/v1/2023.trustnlp-1.8
Bibkey:
Cite (ACL):
Aisha Khatun and Daniel Brown. 2023. Reliability Check: An Analysis of GPT-3’s Response to Sensitive Topics and Prompt Wording. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 73–95, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Reliability Check: An Analysis of GPT-3’s Response to Sensitive Topics and Prompt Wording (Khatun & Brown, TrustNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.trustnlp-1.8.pdf
Video:
 https://aclanthology.org/2023.trustnlp-1.8.mp4