Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition

Petya Osenova, Kiril Simov


Abstract
The paper discusses some fine-tuned models for the tasks of part-of-speech tagging and named entity recognition. The fine-tuning was performed on the basis of an existing BERT pre-trained model and two newly pre-trained BERT models for Bulgarian that are cross-tested on the domain of the Bulgarian part of the ParlaMint corpora as a new domain. In addition, a comparison has been made between the performance of the new fine-tuned BERT models and the available results from the Stanza-based model which the Bulgarian part of the ParlaMint corpora has been annotated with. The observations show the weaknesses in each model as well as the common challenges.
Anthology ID:
2024.parlaclarin-1.4
Volume:
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Darja Fiser, Maria Eskevich, David Bordon
Venues:
ParlaCLARIN | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
30–35
Language:
URL:
https://aclanthology.org/2024.parlaclarin-1.4
DOI:
Bibkey:
Cite (ACL):
Petya Osenova and Kiril Simov. 2024. Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition. In Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 30–35, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition (Osenova & Simov, ParlaCLARIN-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.parlaclarin-1.4.pdf