A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora

Christian Bentz, Tatyana Ruzsics, Alexander Koplenig, Tanja Samardžić


Abstract
Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.
Anthology ID:
W16-4117
Volume:
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, Philippe Blache
Venue:
CL4LC
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
142–153
Language:
URL:
https://aclanthology.org/W16-4117
DOI:
Bibkey:
Cite (ACL):
Christian Bentz, Tatyana Ruzsics, Alexander Koplenig, and Tanja Samardžić. 2016. A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pages 142–153, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora (Bentz et al., CL4LC 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4117.pdf