Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging

Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi


Abstract
Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.
Anthology ID:
2020.findings-emnlp.86
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
961–971
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.86
DOI:
10.18653/v1/2020.findings-emnlp.86
Bibkey:
Cite (ACL):
Ehsan Doostmohammadi, Minoo Nassajian, and Adel Rahimi. 2020. Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 961–971, Online. Association for Computational Linguistics.
Cite (Informal):
Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging (Doostmohammadi et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.86.pdf
Code
 edoost/pert