Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings

Raj Patel, Carlotta Domeniconi


Abstract
Word representations are an important aspect of Natural Language Processing (NLP). Representations are trained using large corpora, either as independent static embeddings or as part of a deep contextualized model. While word embeddings are useful, they struggle on rare and unknown words. As such, a large body of work has been done on estimating rare and unknown words. However, most of the methods focus on static embeddings, with few models focused on contextualized representations. In this work, we propose SPRUCE, a rare/unknown embedding architecture that focuses on contextualized representations. This architecture uses subword attention and embedding post-processing combined with the contextualized model to produce high quality embeddings. We then demonstrate these techniques lead to improved performance in most intrinsic and downstream tasks.
Anthology ID:
2024.findings-naacl.88
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1383–1389
Language:
URL:
https://aclanthology.org/2024.findings-naacl.88
DOI:
Bibkey:
Cite (ACL):
Raj Patel and Carlotta Domeniconi. 2024. Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1383–1389, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings (Patel & Domeniconi, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.88.pdf
Copyright:
 2024.findings-naacl.88.copyright.pdf