Universal Feature-based Morphological Trees

Federica Gamba, Abishek Stephen, Zdeněk Žabokrtský


Abstract
The paper proposes a novel data representation inspired by Universal Dependencies (UD) syntactic trees, which are extended to capture the internal morphological structure of word forms. As a result, morphological segmentation is incorporated within the UD representation of syntactic dependencies. To derive the proposed data structure we leverage existing annotation of UD treebanks as well as available resources for segmentation, and we select 10 languages to work with in the presented case study. Additionally, statistical analysis reveals a robust correlation between morphs and sets of morphological features of words. We thus align the morphs to the observed feature inventories capturing the morphological meaning of morphs. Through the beneficial exploitation of cross-lingual correspondence of morphs, the proposed syntactic representation based on morphological segmentation proves to enhance the comparability of sentence structures across languages.
Anthology ID:
2024.mwe-1.17
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Archna Bhatia, Gosse Bouma, A. Seza Doğruöz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, Alexandre Rademaker
Venues:
MWE | UDW | WS
SIGs:
SIGLEX | SIGPARSE
Publisher:
ELRA and ICCL
Note:
Pages:
125–137
Language:
URL:
https://aclanthology.org/2024.mwe-1.17
DOI:
Bibkey:
Cite (ACL):
Federica Gamba, Abishek Stephen, and Zdeněk Žabokrtský. 2024. Universal Feature-based Morphological Trees. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 125–137, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Universal Feature-based Morphological Trees (Gamba et al., MWE-UDW-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mwe-1.17.pdf