Alena Witzlack-Makarevich
2025
Universal Dependencies Treebank for Khoekhoe (KDT)
Kira Tulchynska
|
Sylvanus Job
|
Alena Witzlack-Makarevich
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
This paper reports on the development of the first dependency treebank for Khoekhoe (KDT). Khoekhoe (Khoe-Kwadi, Namibia) is a low-resource language with few linguistic and computational resources available publicly. This treebank consists of 29k words across six texts taken from various registers. It includes a substantial portion of spoken conversational data. These sentences were annotated manually according to the Universal Dependencies framework. In this paper, apart from presenting the strategies that have been followed to create the treebank, we also discussed some challenging morphological features and syntactic constructions found in the corpus and outlined how we have handled them using the current Universal Dependencies specification.
2022
Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars
Jakob Lesage
|
Hannah J. Haynie
|
Hedvig Skirgård
|
Tobias Weber
|
Alena Witzlack-Makarevich
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Typological databases can contain a wealth of information beyond the collection of linguistic properties across languages. This paper shows how information often overlooked in typological databases can inform the research community about the state of description of the world’s languages. We illustrate this using Grambank, a morphosyntactic typological database covering 2,467 language varieties and based on 3,951 grammatical descriptions. We classify and quantify the comments that accompany coded values in Grambank. We then aggregate these comments and the coded values to derive a level of description for 17 grammatical domains that Grambank covers (negation, adnominal modification, participant marking, tense, aspect, etc.). We show that the description level of grammatical domains varies across space and time. Information about gaps and uncertainties in the descriptive knowledge of grammatical domains within and across languages is essential for a correct analysis of data in typological databases and for the study of grammatical diversity more generally. When collected in a database, such information feeds into disciplines that focus on primary data collection, such as grammaticography and language documentation.
Search
Fix author
Co-authors
- Hannah J. Haynie 1
- Sylvanus Job 1
- Jakob Lesage 1
- Hedvig Skirgård 1
- Kira Tulchynska 1
- show all...