Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints

Salam Khalifa, Abdelrahim Qaddoumi, Jordan Kodner, Owen Rambow


Abstract
We investigate learning surface forms from underlying morphological forms for low-resource language varieties. We concentrate on learning explicit rules with the aid of learned syllable structure constraints, which outperforms neural methods on this small data task and provides interpretable output. Evaluating across one relatively high-resource and two related low-resource Arabic dialects, we find that a model trained only on the high-resource dialect achieves decent performance on the low-resource dialects, useful when no low-resource training data is available. The best results are obtained when our system is trained only on the low-resource dialect data without augmentation from the related higher-resource dialect. We discuss the impact of syllable structure constraints and the strengths and weaknesses of data augmentation and transfer learning from a related dialect.
Anthology ID:
2025.vardial-1.12
Volume:
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jorg Tiedemann, Marcos Zampieri
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–167
Language:
URL:
https://aclanthology.org/2025.vardial-1.12/
DOI:
Bibkey:
Cite (ACL):
Salam Khalifa, Abdelrahim Qaddoumi, Jordan Kodner, and Owen Rambow. 2025. Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints. In Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 157–167, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints (Khalifa et al., VarDial 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.vardial-1.12.pdf