Efficient Sentence Embedding using Discrete Cosine Transform

Nada Almarwani; Hanan Aldarmaki; Mona Diab

doi:10.18653/v1/D19-1380

Efficient Sentence Embedding using Discrete Cosine Transform

Nada Almarwani, Hanan Aldarmaki, Mona Diab

Abstract

Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure. While more complex sequential or convolutional networks potentially yield superior classification performance, the improvements in classification accuracy are typically mediocre compared to the simple vector averaging. As an efficient alternative, we propose the use of discrete cosine transform (DCT) to compress word sequences in an order-preserving manner. The lower order DCT coefficients represent the overall feature patterns in sentences, which results in suitable embeddings for tasks that could benefit from syntactic features. Our results in semantic probing tasks demonstrate that DCT embeddings indeed preserve more syntactic information compared with vector averaging. With practically equivalent complexity, the model yields better overall performance in downstream classification tasks that correlate with syntactic features, which illustrates the capacity of DCT to preserve word order information.

Anthology ID:: D19-1380
Volume:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:: EMNLP | IJCNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3672–3678
Language:
URL:: https://aclanthology.org/D19-1380/
DOI:: 10.18653/v1/D19-1380
Bibkey:
Cite (ACL):: Nada Almarwani, Hanan Aldarmaki, and Mona Diab. 2019. Efficient Sentence Embedding using Discrete Cosine Transform. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3672–3678, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Efficient Sentence Embedding using Discrete Cosine Transform (Almarwani et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-1380.pdf
Code: N-Almarwani/DCT_Sentence_Embedding
Data: MPQA Opinion Corpus, SST, SST-2, SST-5, SentEval

PDF Cite Search Code Fix data