Analysing and Validating Language Complexity Metrics Across South American Indigenous Languages

Felipe Serras, Miguel Carpi, Matheus Branco, Marcelo Finger


Abstract
Language complexity is an emerging concept critical for NLP and for quantitative and cognitive approaches to linguistics. In this work, we evaluate the behavior of a set of compression-based language complexity metrics when applied to a large set of native South American languages. Our goal is to validate the desirable properties of such metrics against a more diverse set of languages, guaranteeing the universality of the techniques developed on the basis of this type of theoretical artifact. Our analysis confirmed with statistical confidence most propositions about the metrics studied, affirming their robustness, despite showing less stability than when the same metrics were applied to Indo-European languages. We also observed that the trade-off between morphological and syntactic complexities is strongly related to language phylogeny.
Anthology ID:
2024.cmcl-1.13
Volume:
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Yohei Oseki
Venues:
CMCL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–165
Language:
URL:
https://aclanthology.org/2024.cmcl-1.13
DOI:
Bibkey:
Cite (ACL):
Felipe Serras, Miguel Carpi, Matheus Branco, and Marcelo Finger. 2024. Analysing and Validating Language Complexity Metrics Across South American Indigenous Languages. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 152–165, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Analysing and Validating Language Complexity Metrics Across South American Indigenous Languages (Serras et al., CMCL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.cmcl-1.13.pdf