Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation

Vladislav Mikhailov, Lorenzo Tosi, Anastasia Khorosheva, Oleg Serikov


Abstract
The paper describes initial experiments in data-driven cross-lingual morphological analysis of open-category words using a combination of unsupervised morpheme segmentation, annotation projection and an LSTM encoder-decoder model with attention. Our algorithm provides lemmatisation and morphological analysis generation for previously unseen low-resource language surface forms with only annotated data on the related languages given. Despite the inherently lossy annotation projection, we achieved the best lemmatisation F1-score in the VarDial 2019 Shared Task on Cross-Lingual Morphological Analysis for both Karachay-Balkar (Turkic languages, agglutinative morphology) and Sardinian (Romance languages, fusional morphology).
Anthology ID:
W19-1415
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
144–152
Language:
URL:
https://aclanthology.org/W19-1415
DOI:
10.18653/v1/W19-1415
Bibkey:
Cite (ACL):
Vladislav Mikhailov, Lorenzo Tosi, Anastasia Khorosheva, and Oleg Serikov. 2019. Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 144–152, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation (Mikhailov et al., VarDial 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1415.pdf