“Was it “stated” or was it “claimed”?: How linguistic bias affects generative language models

Roma Patel; Ellie Pavlick

doi:10.18653/v1/2021.emnlp-main.790

“Was it “stated” or was it “claimed”?: How linguistic bias affects generative language models

Abstract

People use language in subtle and nuanced ways to convey their beliefs. For instance, saying claimed instead of said casts doubt on the truthfulness of the underlying proposition, thus representing the author’s opinion on the matter. Several works have identified such linguistic classes of words that occur frequently in natural language text and are bias-inducing by virtue of their framing effects. In this paper, we test whether generative language models (including GPT-2 (CITATION) are sensitive to these linguistic framing effects. In particular, we test whether prompts that contain linguistic markers of author bias (e.g., hedges, implicatives, subjective intensifiers, assertives) influence the distribution of the generated text. Although these framing effects are subtle and stylistic, we find evidence that they lead to measurable style and topic differences in the generated text, leading to language that is, on average, more polarised and more skewed towards controversial entities and events.

Anthology ID:: 2021.emnlp-main.790
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10080–10095
Language:
URL:: https://aclanthology.org/2021.emnlp-main.790/
DOI:: 10.18653/v1/2021.emnlp-main.790
Bibkey:
Cite (ACL):: Roma Patel and Ellie Pavlick. 2021. “Was it “stated” or was it “claimed”?: How linguistic bias affects generative language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10080–10095, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: “Was it “stated” or was it “claimed”?: How linguistic bias affects generative language models (Patel & Pavlick, EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.790.pdf

PDF Cite Search Fix data