Neural-based Noise Filtering from Word Embeddings

Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu


Abstract
Word embeddings have been demonstrated to benefit NLP tasks impressively. Yet, there is room for improvements in the vector representations, because current word embeddings typically contain unnecessary information, i.e., noise. We propose two novel models to improve word embeddings by unsupervised learning, in order to yield word denoising embeddings. The word denoising embeddings are obtained by strengthening salient information and weakening noise in the original word embeddings, based on a deep feed-forward neural network filter. Results from benchmark tasks show that the filtered word denoising embeddings outperform the original word embeddings.
Anthology ID:
C16-1254
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2699–2707
Language:
URL:
https://aclanthology.org/C16-1254
DOI:
Bibkey:
Cite (ACL):
Kim Anh Nguyen, Sabine Schulte im Walde, and Ngoc Thang Vu. 2016. Neural-based Noise Filtering from Word Embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2699–2707, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Neural-based Noise Filtering from Word Embeddings (Nguyen et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1254.pdf
Code
 nguyenkh/NeuralDenoising