Compounds in Universal Dependencies: A Survey in Five European Languages

Emil Svoboda, Magda Ševčíková


Abstract
In Universal Dependencies, compounds, which we understand as words containing two or more roots, are represented according to tokenization, which reflects the orthographic conventions of the language. A closed compound (e.g. waterfall) corresponds to a single word in Universal Dependencies while a hyphenated compound (father-in-law) and an open compound (apple pie) to multiple words. The aim of this paper is to open a discussion on how to move towards a more consistent annotation of compounds.The solution we argue for is to represent the internal structure of all compound types analogously to syntactic phrases, which would not only increase the comparability of compounding within and across languages, but also allow comparisons of compounds and syntactic phrases.
Anthology ID:
2024.sigtyp-1.12
Volume:
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
88–99
Language:
URL:
https://aclanthology.org/2024.sigtyp-1.12
DOI:
Bibkey:
Cite (ACL):
Emil Svoboda and Magda Ševčíková. 2024. Compounds in Universal Dependencies: A Survey in Five European Languages. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 88–99, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
Compounds in Universal Dependencies: A Survey in Five European Languages (Svoboda & Ševčíková, SIGTYP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigtyp-1.12.pdf