Short-form verbal arts as a speech data resource in the field

Matthew Faytak, Tianle Yang, Pius Wuchu Akumbu, Ivo Forghema Njuasi, Éric Le Ferrand


Abstract
We propose a method for efficient field data collection of speech resource data which leverages short-form verbal arts, namely riddles and proverbs, which permit a predictable transcript to be assigned to naturalistic but conventionalized utterances. As a proof of concept, we describe a 5.25 hour corpus of proverbs and riddles collected for Kom, a low-resource language of Cameroon, and conduct ASR modeling experiments on the corpus. Results suggest that the method yields high quality speech data, albeit with relatively low lexical diversity. We highlight the alignment of the collected data with community priorities for cultural education and preservation in the Cameroonian context.
Anthology ID:
2026.fieldmatters-1.5
Volume:
Proceedings of the Fifth Workshop on NLP Applications to Field Linguistics
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
FieldMatters | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–45
Language:
URL:
https://aclanthology.org/2026.fieldmatters-1.5/
DOI:
Bibkey:
Cite (ACL):
Matthew Faytak, Tianle Yang, Pius Wuchu Akumbu, Ivo Forghema Njuasi, and Éric Le Ferrand. 2026. Short-form verbal arts as a speech data resource in the field. In Proceedings of the Fifth Workshop on NLP Applications to Field Linguistics, pages 38–45, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Short-form verbal arts as a speech data resource in the field (Faytak et al., FieldMatters 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.fieldmatters-1.5.pdf