Adel Rahimi

2020

Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
Ehsan Doostmohammadi | Minoo Nassajian | Adel Rahimi
Proceedings of the 28th International Conference on Computational Linguistics

Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeling problem. We achieved a macro-averaged F1-score of 92.40% on a carefully collected corpus of 500 sentences with a high level of difficulty.

pdf bib abs

Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging
Ehsan Doostmohammadi | Minoo Nassajian | Adel Rahimi
Findings of the Association for Computational Linguistics: EMNLP 2020

Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.

Co-authors

Venues

coling1
findings1

Fix author