Semi-supervised Adversarial Text Generation based on Seq2Seq models

Hieu Le; Dieu-Thu Le; Verena Weber; Chris Church; Kay Rottmann; Melanie Bradford; Peter Chin

doi:10.18653/v1/2022.emnlp-industry.26

Semi-supervised Adversarial Text Generation based on Seq2Seq models

Hieu Le, Dieu-thu Le, Verena Weber, Chris Church, Kay Rottmann, Melanie Bradford, Peter Chin

Abstract

To improve deep learning models’ robustness, adversarial training has been frequently used in computer vision with satisfying results. However, adversarial perturbation on text have turned out to be more challenging due to the discrete nature of text. The generated adversarial text might not sound natural or does not preserve semantics, which is the key for real world applications where text classification is based on semantic meaning. In this paper, we describe a new way for generating adversarial samples by using pseudo-labeled in-domain text data to train a seq2seq model for adversarial generation and combine it with paraphrase detection. We showcase the benefit of our approach for a real-world Natural Language Understanding (NLU) task, which maps a user’s request to an intent. Furthermore, we experiment with gradient-based training for the NLU task and try using token importance scores to guide the adversarial text generation. We show that our approach can generate realistic and relevant adversarial samples compared to other state-of-the-art adversarial training methods. Applying adversarial training using these generated samples helps the NLU model to recover up to 70% of these types of errors and makes the model more robust, especially in the tail distribution in a large scale real world application.

Anthology ID:: 2022.emnlp-industry.26
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: December
Year:: 2022
Address:: Abu Dhabi, UAE
Editors:: Yunyao Li, Angeliki Lazaridou
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 254–262
Language:
URL:: https://aclanthology.org/2022.emnlp-industry.26
DOI:: 10.18653/v1/2022.emnlp-industry.26
Bibkey:
Cite (ACL):: Hieu Le, Dieu-thu Le, Verena Weber, Chris Church, Kay Rottmann, Melanie Bradford, and Peter Chin. 2022. Semi-supervised Adversarial Text Generation based on Seq2Seq models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 254–262, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Semi-supervised Adversarial Text Generation based on Seq2Seq models (Le et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-industry.26.pdf

PDF Cite Search