Field Embedding: A Unified Grain-Based Framework for Word Representation

Junjie Luo, Xi Chen, Jichao Sun, Yuejia Xiang, Ningyu Zhang, Xiang Wan


Abstract
Word representations empowered with additional linguistic information have been widely studied and proved to outperform traditional embeddings. Current methods mainly focus on learning embeddings for words while embeddings of linguistic information (referred to as grain embeddings) are discarded after the learning. This work proposes a framework field embedding to jointly learn both word and grain embeddings by incorporating morphological, phonetic, and syntactical linguistic fields. The framework leverages an innovative fine-grained pipeline that integrates multiple linguistic fields and produces high-quality grain sequences for learning supreme word representations. A novel algorithm is also designed to learn embeddings for words and grains by capturing information that is contained within each field and that is shared across them. Experimental results of lexical tasks and downstream natural language processing tasks illustrate that our framework can learn better word embeddings and grain embeddings. Qualitative evaluations show grain embeddings effectively capture the semantic information.
Anthology ID:
2021.naacl-main.140
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1754–1762
Language:
URL:
https://aclanthology.org/2021.naacl-main.140
DOI:
10.18653/v1/2021.naacl-main.140
Bibkey:
Cite (ACL):
Junjie Luo, Xi Chen, Jichao Sun, Yuejia Xiang, Ningyu Zhang, and Xiang Wan. 2021. Field Embedding: A Unified Grain-Based Framework for Word Representation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1754–1762, Online. Association for Computational Linguistics.
Cite (Informal):
Field Embedding: A Unified Grain-Based Framework for Word Representation (Luo et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.140.pdf
Optional supplementary data:
 2021.naacl-main.140.OptionalSupplementaryData.zip
Optional supplementary code:
 2021.naacl-main.140.OptionalSupplementaryCode.zip
Video:
 https://aclanthology.org/2021.naacl-main.140.mp4