BERT-PersNER: A New Model for Persian Named Entity Recognition

Farane Jalali Farahani, Gholamreza Ghassem-Sani


Abstract
Named entity recognition (NER) is one of the major tasks in natural language processing. A named entity is often a word or expression that bears a valuable piece of information, which can be effectively employed by some major NLP tasks such as machine translation, question answering, and text summarization. In this paper, we introduce a new model called BERT-PersNER (BERT based Persian Named Entity Recognizer), in which we have applied transfer learning and active learning approaches to NER in Persian, which is regarded as a low-resource language. Like many others, we have used Conditional Random Field for tag decoding in our proposed architecture. BERT-PersNER has outperformed two available studies in Persian NER, in most cases of our experiments using the supervised learning approach on two Persian datasets called Arman and Peyma. Besides, as the very first effort to try active learning in the Persian NER, using only 30% of Arman and 20% of Peyma, we respectively achieved 92.15%, and 92.41% performance of the mentioned supervised learning experiments.
Anthology ID:
2021.ranlp-1.73
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
647–654
Language:
URL:
https://aclanthology.org/2021.ranlp-1.73
DOI:
Bibkey:
Cite (ACL):
Farane Jalali Farahani and Gholamreza Ghassem-Sani. 2021. BERT-PersNER: A New Model for Persian Named Entity Recognition. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 647–654, Held Online. INCOMA Ltd..
Cite (Informal):
BERT-PersNER: A New Model for Persian Named Entity Recognition (Jalali Farahani & Ghassem-Sani, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.73.pdf