Exploiting Word Internal Structures for Generic Chinese Sentence Representation

Shaonan Wang, Jiajun Zhang, Chengqing Zong


Abstract
We introduce a novel mixed characterword architecture to improve Chinese sentence representations, by utilizing rich semantic information of word internal structures. Our architecture uses two key strategies. The first is a mask gate on characters, learning the relation among characters in a word. The second is a maxpooling operation on words, adaptively finding the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task.
Anthology ID:
D17-1029
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
298–303
Language:
URL:
https://aclanthology.org/D17-1029/
DOI:
10.18653/v1/D17-1029
Bibkey:
Cite (ACL):
Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Exploiting Word Internal Structures for Generic Chinese Sentence Representation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 298–303, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Exploiting Word Internal Structures for Generic Chinese Sentence Representation (Wang et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1029.pdf