Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost

Lihu Chen, Gael Varoquaux, Fabian Suchanek


Abstract
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT) and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants. Moreover, it can be used in a plug-and-play fashion with FastText and BERT, where it significantly improves their robustness.
Anthology ID:
2022.acl-long.245
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3488–3504
Language:
URL:
https://aclanthology.org/2022.acl-long.245
DOI:
10.18653/v1/2022.acl-long.245
Bibkey:
Cite (ACL):
Lihu Chen, Gael Varoquaux, and Fabian Suchanek. 2022. Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3488–3504, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost (Chen et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.245.pdf
Software:
 2022.acl-long.245.software.zip
Video:
 https://aclanthology.org/2022.acl-long.245.mp4
Code
 tigerchen52/love
Data
SSTSST-2