Learning Compressed Sentence Representations for On-Device Text Processing

Dinghan Shen; Pengyu Cheng; Dhanasekar Sundararaman; Xinyuan Zhang; Qian Yang; Meng Tang; Asli Celikyilmaz; Lawrence Carin

doi:10.18653/v1/P19-1011

Learning Compressed Sentence Representations for On-Device Text Processing

Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, Xinyuan Zhang, Qian Yang, Meng Tang, Asli Celikyilmaz, Lawrence Carin

Abstract

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.

Anthology ID:: P19-1011
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 107–116
Language:
URL:: https://aclanthology.org/P19-1011/
DOI:: 10.18653/v1/P19-1011
Bibkey:
Cite (ACL):: Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, Xinyuan Zhang, Qian Yang, Meng Tang, Asli Celikyilmaz, and Lawrence Carin. 2019. Learning Compressed Sentence Representations for On-Device Text Processing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 107–116, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Learning Compressed Sentence Representations for On-Device Text Processing (Shen et al., ACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/P19-1011.pdf
Video:: https://aclanthology.org/P19-1011.mp4

PDF Cite Search Video Fix data