Evaluation of Stacked Embeddings for Bulgarian on the Downstream Tasks POS and NERC

Iva Marinova


Abstract
This paper reports on experiments with different stacks of word embeddings and evaluation of their usefulness for Bulgarian downstream tasks such as Named Entity Recognition and Classification (NERC) and Part-of-speech (POS) Tagging. Word embeddings stay in the core of the development of NLP, with several key language models being created over the last two years like FastText (CITATION), ElMo (CITATION), BERT (CITATION) and Flair (CITATION). Stacking or combining different word embeddings is another technique used in this paper and still not reported for Bulgarian NERC. Well-established architecture is used for the sequence tagging task such as BI-LSTM-CRF, and different pre-trained language models are combined in the embedding layer to decide which combination of them scores better.
Anthology ID:
R19-2008
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2019
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
48–54
Language:
URL:
https://aclanthology.org/R19-2008
DOI:
10.26615/issn.2603-2821.2019_008
Bibkey:
Cite (ACL):
Iva Marinova. 2019. Evaluation of Stacked Embeddings for Bulgarian on the Downstream Tasks POS and NERC. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 48–54, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Evaluation of Stacked Embeddings for Bulgarian on the Downstream Tasks POS and NERC (Marinova, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-2008.pdf