Analogy-Guided Evolutionary Pretraining of Binary Word Embeddings

R. Alexander Knipper, Md. Mahadi Hassan, Mehdi Sadi, Shubhra Kanti Karmaker Santu


Abstract
As we begin to see low-powered computing paradigms (Neuromorphic Computing, Spiking Neural Networks, etc.) becoming more popular, learning binary word embeddings has become increasingly important for supporting NLP applications at the edge. Existing binary word embeddings are mostly derived from pretrained real-valued embeddings through different simple transformations, which often break the semantic consistency and the so-called “arithmetic” properties learned by the original, real-valued embeddings. This paper aims to address this limitation by introducing a new approach to learn binary embeddings from scratch, preserving the semantic relationships between words as well as the arithmetic properties of the embeddings themselves. To achieve this, we propose a novel genetic algorithm to learn the relationships between words from existing word analogy data-sets, carefully making sure that the arithmetic properties of the relationships are preserved. Evaluating our generated 16, 32, and 64-bit binary word embeddings on Mikolov’s word analogy task shows that more than 95% of the time, the best fit for the analogy is ranked in the top 5 most similar words in terms of cosine similarity.
Anthology ID:
2022.aacl-main.52
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
683–693
Language:
URL:
https://aclanthology.org/2022.aacl-main.52
DOI:
Bibkey:
Cite (ACL):
R. Alexander Knipper, Md. Mahadi Hassan, Mehdi Sadi, and Shubhra Kanti Karmaker Santu. 2022. Analogy-Guided Evolutionary Pretraining of Binary Word Embeddings. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 683–693, Online only. Association for Computational Linguistics.
Cite (Informal):
Analogy-Guided Evolutionary Pretraining of Binary Word Embeddings (Knipper et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-main.52.pdf
Dataset:
 2022.aacl-main.52.Dataset.zip
Software:
 2022.aacl-main.52.Software.zip