Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding

Joseph Sanu, Mingbin Xu, Hui Jiang, Quan Liu


Abstract
In this paper, we propose to learn word embeddings based on the recent fixed-size ordinally forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence into a fixed-size representation. We use FOFE to fully encode the left and right context of each word in a corpus to construct a novel word-context matrix, which is further weighted and factorized using truncated SVD to generate low-dimension word embedding vectors. We evaluate this alternate method in encoding word-context statistics and show the new FOFE method has a notable effect on the resulting word embeddings. Experimental results on several popular word similarity tasks have demonstrated that the proposed method outperforms other SVD models that use canonical count based techniques to generate word context matrices.
Anthology ID:
D17-1031
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
310–315
Language:
URL:
https://aclanthology.org/D17-1031
DOI:
10.18653/v1/D17-1031
Bibkey:
Cite (ACL):
Joseph Sanu, Mingbin Xu, Hui Jiang, and Quan Liu. 2017. Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 310–315, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding (Sanu et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1031.pdf