Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus

Kanako Komiya, Hiroyuki Shinnou


Abstract
Fine-tuning is a popular method to achieve better performance when only a small target corpus is available. However, it requires tuning of a number of metaparameters and thus it might carry risk of adverse effect when inappropriate metaparameters are used. Therefore, we investigate effective parameters for fine-tuning when only a small target corpus is available. In the current study, we target at improving Japanese word embeddings created from a huge corpus. First, we demonstrate that even the word embeddings created from the huge corpus are affected by domain shift. After that, we investigate effective parameters for fine-tuning of the word embeddings using a small target corpus. We used perplexity of a language model obtained from a Long Short-Term Memory network to assess the word embeddings input into the network. The experiments revealed that fine-tuning sometimes give adverse effect when only a small target corpus is used and batch size is the most important parameter for fine-tuning. In addition, we confirmed that effect of fine-tuning is higher when size of a target corpus was larger.
Anthology ID:
W18-3408
Volume:
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:
July
Year:
2018
Address:
Melbourne
Editors:
Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
60–67
Language:
URL:
https://aclanthology.org/W18-3408
DOI:
10.18653/v1/W18-3408
Bibkey:
Cite (ACL):
Kanako Komiya and Hiroyuki Shinnou. 2018. Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 60–67, Melbourne. Association for Computational Linguistics.
Cite (Informal):
Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus (Komiya & Shinnou, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3408.pdf