Investigating phonological theories with crowd-sourced data: The Inventory Size Hypothesis in the light of Lingua Libre

Mathilde Hutin, Marc Allassonnière-Tang


Abstract
Data-driven research in phonetics and phonology relies massively on oral resources, and access thereto. We propose to explore a question in comparative linguistics using an open-source crowd-sourced corpus, Lingua Libre, Wikimedia’s participatory linguistic library, to show that such corpora may offer a solution to typologists wishing to explore numerous languages at once. For the present proof of concept, we compare the realizations of Italian and Spanish vowels (sample size = 5000) to investigate whether vowel production is influenced by the size of the phonemic inventory (the Inventory Size Hypothesis), by the exact shape of the inventory (the Vowel Quality Hypothesis) or by none of the above. Results show that the size of the inventory does not seem to influence vowel production, thus supporting previous research, but also that the shape of the inventory may well be a factor determining the extent of variation in vowel production. Most of all, these results show that Lingua Libre has the potential to provide valuable data for linguistic inquiry.
Anthology ID:
2022.sigmorphon-1.3
Volume:
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Garrett Nicolai, Eleanor Chodroff
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
23–28
Language:
URL:
https://aclanthology.org/2022.sigmorphon-1.3
DOI:
10.18653/v1/2022.sigmorphon-1.3
Bibkey:
Cite (ACL):
Mathilde Hutin and Marc Allassonnière-Tang. 2022. Investigating phonological theories with crowd-sourced data: The Inventory Size Hypothesis in the light of Lingua Libre. In Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 23–28, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Investigating phonological theories with crowd-sourced data: The Inventory Size Hypothesis in the light of Lingua Libre (Hutin & Allassonnière-Tang, SIGMORPHON 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigmorphon-1.3.pdf