Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging

Semih Yavuz; Kazuma Hashimoto; Wenhao Liu; Nitish Shirish Keskar; Richard Socher; Caiming Xiong

doi:10.18653/v1/2020.emnlp-main.412

Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging

Semih Yavuz, Kazuma Hashimoto, Wenhao Liu, Nitish Shirish Keskar, Richard Socher, Caiming Xiong

Abstract

The concept of Dialogue Act (DA) is universal across different task-oriented dialogue domains - the act of “request” carries the same speaker intention whether it is for restaurant reservation or flight booking. However, DA taggers trained on one domain do not generalize well to other domains, which leaves us with the expensive need for a large amount of annotated data in the target domain. In this work, we investigate how to better adapt DA taggers to desired target domains with only unlabeled data. We propose MaskAugment, a controllable mechanism that augments text input by leveraging the pre-trained Mask token from BERT model. Inspired by consistency regularization, we use MaskAugment to introduce an unsupervised teacher-student learning scheme to examine the domain adaptation of DA taggers. Our extensive experiments on the Simulated Dialogue (GSim) and Schema-Guided Dialogue (SGD) datasets show that MaskAugment is useful in improving the cross-domain generalization for DA tagging.

Anthology ID:: 2020.emnlp-main.412
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5083–5089
Language:
URL:: https://aclanthology.org/2020.emnlp-main.412
DOI:: 10.18653/v1/2020.emnlp-main.412
Bibkey:
Cite (ACL):: Semih Yavuz, Kazuma Hashimoto, Wenhao Liu, Nitish Shirish Keskar, Richard Socher, and Caiming Xiong. 2020. Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5083–5089, Online. Association for Computational Linguistics.
Cite (Informal):: Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging (Yavuz et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.412.pdf
Video:: https://slideslive.com/38939326

PDF Cite Search Video