A Fine-Grained Domain Adaption Model for Joint Word Segmentation and POS Tagging

Peijie Jiang, Dingkun Long, Yueheng Sun, Meishan Zhang, Guangwei Xu, Pengjun Xie


Abstract
Domain adaption for word segmentation and POS tagging is a challenging problem for Chinese lexical processing. Self-training is one promising solution for it, which struggles to construct a set of high-quality pseudo training instances for the target domain. Previous work usually assumes a universal source-to-target adaption to collect such pseudo corpus, ignoring the different gaps from the target sentences to the source domain. In this work, we start from joint word segmentation and POS tagging, presenting a fine-grained domain adaption method to model the gaps accurately. We measure the gaps by one simple and intuitive metric, and adopt it to develop a pseudo target domain corpus based on fine-grained subdomains incrementally. A novel domain-mixed representation learning model is proposed accordingly to encode the multiple subdomains effectively. The whole process is performed progressively for both corpus construction and model training. Experimental results on a benchmark dataset show that our method can gain significant improvements over a vary of baselines. Extensive analyses are performed to show the advantages of our final domain adaption model as well.
Anthology ID:
2021.emnlp-main.291
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3587–3598
Language:
URL:
https://aclanthology.org/2021.emnlp-main.291
DOI:
10.18653/v1/2021.emnlp-main.291
Bibkey:
Cite (ACL):
Peijie Jiang, Dingkun Long, Yueheng Sun, Meishan Zhang, Guangwei Xu, and Pengjun Xie. 2021. A Fine-Grained Domain Adaption Model for Joint Word Segmentation and POS Tagging. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3587–3598, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A Fine-Grained Domain Adaption Model for Joint Word Segmentation and POS Tagging (Jiang et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.291.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.291.mp4
Code
 jzx555/fgda