Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding

Maria Mihaela Trusca; Liesbeth Allein

doi:10.18653/v1/2025.trustnlp-main.19

Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding

Abstract

Interpretations of a single sentence can vary, particularly when its context is lost. This paper aims to simulate how readers perceive content with varying toxicity levels by generating diverse interpretations of out-of-context sentences. By modeling toxicity we can anticipate misunderstandings and reveal hidden toxic meanings. Our proposed decoding strategy explicitly controls toxicity in the set of generated interpretations by (i) aligning interpretation toxicity with the input, (ii) relaxing toxicity constraints for more toxic input sentences, and (iii) promoting diversity in toxicity levels within the set of generated interpretations. Experimental results show that our method improves alignment with human-written interpretations in both syntax and semantics while reducing model prediction uncertainty.

Anthology ID:: 2025.trustnlp-main.19
Volume:: Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Trista Cao, Anubrata Das, Tharindu Kumarage, Yixin Wan, Satyapriya Krishna, Ninareh Mehrabi, Jwala Dhamala, Anil Ramakrishna, Aram Galystan, Anoop Kumar, Rahul Gupta, Kai-Wei Chang
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 291–297
Language:
URL:: https://aclanthology.org/2025.trustnlp-main.19/
DOI:: 10.18653/v1/2025.trustnlp-main.19
Bibkey:
Cite (ACL):: Maria Mihaela Trusca and Liesbeth Allein. 2025. Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding. In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 291–297, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding (Trusca & Allein, TrustNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.trustnlp-main.19.pdf

PDF Cite Search Fix data