Strategies to Improve Low-Resource Agglutinative Languages Morphological Inflection

Gulinigeer Abudouwaili, Wayit Ablez, Kahaerjiang Abiderexiti, Aishan Wumaier, Nian Yi


Abstract
Morphological inflection is a crucial task in the field of morphology and is typically considered a sequence transduction task. In recent years, it has received substantial attention from researchers and made significant progress. Models have achieved impressive performance levels for both high- and low-resource languages. However, when the distribution of instances in the training dataset changes, or novel lemma or feature labels are predicted, the model’s accuracy declines. In agglutinative languages, morphological inflection involves phonological phenomena while generating new words, which can alter the syllable patterns at the boundary between the lemma and the suffixes. This paper proposes four strategies for low-resource agglutinative languages to enhance the model’s generalization ability. Firstly, a convolution module extracts syllable-like units from lemmas, allowing the model to learn syllable features. Secondly, the lemma and feature labels are represented separately in the input, and the position encoding of the feature labels is marked so that the model learns the order between suffixes and labels. Thirdly, the model recognizes the common substrings in lemmas through two special characters and copies them into words. Finally, combined with syllable features, we improve the data augmentation method. A series of experiments show that the proposed model in this paper is superior to other baseline models.
Anthology ID:
2023.conll-1.34
Volume:
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Jing Jiang, David Reitter, Shumin Deng
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
508–520
Language:
URL:
https://aclanthology.org/2023.conll-1.34
DOI:
10.18653/v1/2023.conll-1.34
Bibkey:
Cite (ACL):
Gulinigeer Abudouwaili, Wayit Ablez, Kahaerjiang Abiderexiti, Aishan Wumaier, and Nian Yi. 2023. Strategies to Improve Low-Resource Agglutinative Languages Morphological Inflection. In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pages 508–520, Singapore. Association for Computational Linguistics.
Cite (Informal):
Strategies to Improve Low-Resource Agglutinative Languages Morphological Inflection (Abudouwaili et al., CoNLL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.conll-1.34.pdf