Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars

Jakob Lesage, Hannah J. Haynie, Hedvig Skirgård, Tobias Weber, Alena Witzlack-Makarevich


Abstract
Typological databases can contain a wealth of information beyond the collection of linguistic properties across languages. This paper shows how information often overlooked in typological databases can inform the research community about the state of description of the world’s languages. We illustrate this using Grambank, a morphosyntactic typological database covering 2,467 language varieties and based on 3,951 grammatical descriptions. We classify and quantify the comments that accompany coded values in Grambank. We then aggregate these comments and the coded values to derive a level of description for 17 grammatical domains that Grambank covers (negation, adnominal modification, participant marking, tense, aspect, etc.). We show that the description level of grammatical domains varies across space and time. Information about gaps and uncertainties in the descriptive knowledge of grammatical domains within and across languages is essential for a correct analysis of data in typological databases and for the study of grammatical diversity more generally. When collected in a database, such information feeds into disciplines that focus on primary data collection, such as grammaticography and language documentation.
Anthology ID:
2022.lrec-1.309
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2884–2890
Language:
URL:
https://aclanthology.org/2022.lrec-1.309
DOI:
Bibkey:
Cite (ACL):
Jakob Lesage, Hannah J. Haynie, Hedvig Skirgård, Tobias Weber, and Alena Witzlack-Makarevich. 2022. Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2884–2890, Marseille, France. European Language Resources Association.
Cite (Informal):
Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars (Lesage et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.309.pdf