Lorenzo Tosi
2019
Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation
Vladislav Mikhailov
|
Lorenzo Tosi
|
Anastasia Khorosheva
|
Oleg Serikov
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
The paper describes initial experiments in data-driven cross-lingual morphological analysis of open-category words using a combination of unsupervised morpheme segmentation, annotation projection and an LSTM encoder-decoder model with attention. Our algorithm provides lemmatisation and morphological analysis generation for previously unseen low-resource language surface forms with only annotated data on the related languages given. Despite the inherently lossy annotation projection, we achieved the best lemmatisation F1-score in the VarDial 2019 Shared Task on Cross-Lingual Morphological Analysis for both Karachay-Balkar (Turkic languages, agglutinative morphology) and Sardinian (Romance languages, fusional morphology).
Search