Domain Adaptation with BERT-based Domain Classification and Data Selection

Xiaofei Ma, Peng Xu, Zhiguo Wang, Ramesh Nallapati, Bing Xiang


Abstract
The performance of deep neural models can deteriorate substantially when there is a domain shift between training and test data. For example, the pre-trained BERT model can be easily fine-tuned with just one additional output layer to create a state-of-the-art model for a wide range of tasks. However, the fine-tuned BERT model suffers considerably at zero-shot when applied to a different domain. In this paper, we present a novel two-step domain adaptation framework based on curriculum learning and domain-discriminative data selection. The domain adaptation is conducted in a mostly unsupervised manner using a small target domain validation set for hyper-parameter tuning. We tested the framework on four large public datasets with different domain similarities and task types. Our framework outperforms a popular discrepancy-based domain adaptation method on most transfer tasks while consuming only a fraction of the training budget.
Anthology ID:
D19-6109
Volume:
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–83
Language:
URL:
https://aclanthology.org/D19-6109
DOI:
10.18653/v1/D19-6109
Bibkey:
Cite (ACL):
Xiaofei Ma, Peng Xu, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang. 2019. Domain Adaptation with BERT-based Domain Classification and Data Selection. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 76–83, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Domain Adaptation with BERT-based Domain Classification and Data Selection (Ma et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-6109.pdf