Katarzyna Marszałek-Kowalewska


2021

pdf bib
Discovery of Multiword Expressions with Loanwords and Their Equivalents in the Persian Language
Katarzyna Marszałek-Kowalewska
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper presents an attempt at multiword expressions (MWEs) discovery in the Persian language. It focuses on extracting MWEs containing lemmas of a particular group: loanwords in Persian and their equivalents proposed by the Academy of Persian Language and Literature. In order to discover such MWEs, four association measures (AMs) are used and evaluated. Finally, the list of extracted MWEs is analyzed, and a comparison between expressions with loanwords and equivalents is presented. To our knowledge, this is the first time such analysis was provided for the Persian language.

pdf bib
The Impact of Text Normalization on Multiword Expressions Discovery in Persian
Katarzyna Marszałek-Kowalewska
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper evaluates normalization procedures of Persian text for a downstream NLP task - multiword expressions (MWEs) discovery. We discuss the challenges the Persian language poses for NLP and evaluate open-source tools that try to address these difficulties. The best-performing tool is later used in the main task - MWEs discovery. In order to discover MWEs, we use association measures and a subpart of the MirasText corpus. The results show that an F-score is 26% higher in the case of normalized input data.
Search
Co-authors
    Venues