Domain-Specific New Words Detection in Chinese

Ao Chen, Maosong Sun


Abstract
With the explosive growth of Internet, more and more domain-specific environments appear, such as forums, blogs, MOOCs and etc. Domain-specific words appear in these areas and always play a critical role in the domain-specific NLP tasks. This paper aims at extracting Chinese domain-specific new words automatically. The extraction of domain-specific new words has two parts including both new words in this domain and the especially important words. In this work, we propose a joint statistical model to perform these two works simultaneously. Compared to traditional new words detection models, our model doesn’t need handcraft features which are labor intensive. Experimental results demonstrate that our joint model achieves a better performance compared with the state-of-the-art methods.
Anthology ID:
S17-1005
Volume:
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Nancy Ide, Aurélie Herbelot, Lluís Màrquez
Venue:
*SEM
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–53
Language:
URL:
https://aclanthology.org/S17-1005
DOI:
10.18653/v1/S17-1005
Bibkey:
Cite (ACL):
Ao Chen and Maosong Sun. 2017. Domain-Specific New Words Detection in Chinese. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pages 44–53, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Domain-Specific New Words Detection in Chinese (Chen & Sun, *SEM 2017)
Copy Citation:
PDF:
https://aclanthology.org/S17-1005.pdf
Code
 dreamszl/dtopwords