ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task

Son Vu Xuan, Thanh Vu, Son Tran, Lili Jiang


Abstract
Given many recent advanced embedding models, selecting pre-trained word representation (i.e., word embedding) models best fit for a specific downstream NLP task is non-trivial. In this paper, we propose a systematic approach to extracting, evaluating, and visualizing multiple sets of pre-trained word embed- dings to determine which embeddings should be used in a downstream task. First, for extraction, we provide a method to extract a subset of the embeddings to be used in the downstream NLP tasks. Second, for evaluation, we analyse the quality of pre-trained embeddings using an input word analogy list. Finally, we visualize the embedding space to explore the embedded words interactively. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recogni- tion (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embed- dings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https: //github.com/vietnlp/etnlp.
Anthology ID:
R19-1147
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1285–1294
Language:
URL:
https://aclanthology.org/R19-1147
DOI:
10.26615/978-954-452-056-4_147
Bibkey:
Cite (ACL):
Son Vu Xuan, Thanh Vu, Son Tran, and Lili Jiang. 2019. ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 1285–1294, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task (Vu Xuan et al., RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1147.pdf
Code
 vietnlp/etnlp +  additional community code