Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness

Hengtong Zhang; Tianhang Zheng; Yaliang Li; Jing Gao; Lu Su; Bo Li

doi:10.18653/v1/2021.emnlp-main.418

Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness

Hengtong Zhang, Tianhang Zheng, Yaliang Li, Jing Gao, Lu Su, Bo Li

Abstract

Seq2seq models have demonstrated their incredible effectiveness in a large variety of applications. However, recent research has shown that inappropriate language in training samples and well-designed testing cases can induce seq2seq models to output profanity. These outputs may potentially hurt the usability of seq2seq models and make the end-users feel offended. To address this problem, we propose a training framework with certified robustness to eliminate the causes that trigger the generation of profanity. The proposed training framework leverages merely a short list of profanity examples to prevent seq2seq models from generating a broader spectrum of profanity. The framework is composed of a pattern-eliminating training component to suppress the impact of language patterns with profanity in the training set, and a trigger-resisting training component to provide certified robustness for seq2seq models against intentionally injected profanity-triggering expressions in test samples. In the experiments, we consider two representative NLP tasks that seq2seq can be applied to, i.e., style transfer and dialogue generation. Extensive experimental results show that the proposed training framework can successfully prevent the NLP models from generating profanity.

Anthology ID:: 2021.emnlp-main.418
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5151–5161
Language:
URL:: https://aclanthology.org/2021.emnlp-main.418
DOI:: 10.18653/v1/2021.emnlp-main.418
Bibkey:
Cite (ACL):: Hengtong Zhang, Tianhang Zheng, Yaliang Li, Jing Gao, Lu Su, and Bo Li. 2021. Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5151–5161, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness (Zhang et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.418.pdf

PDF Cite Search