Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection

Aso Mahmudi, Borja Herce, Demian Inostroza Améstica, Andreas Scherbakov, Eduard H. Hovy, Ekaterina Vylomova


Abstract
Linguistic fieldwork is an important component in language documentation and the creation of comprehensive linguistic corpora. Despite its significance, the process is often lengthy, exhaustive, and time-consuming. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.
Anthology ID:
2025.bucc-1.8
Volume:
Proceedings of the 18th Workshop on Building and Using Comparable Corpora (BUCC)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Serge Sharoff, Ayla Rigouts Terryn, Pierre Zweigenbaum, Reinhard Rapp
Venues:
BUCC | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–72
Language:
URL:
https://aclanthology.org/2025.bucc-1.8/
DOI:
Bibkey:
Cite (ACL):
Aso Mahmudi, Borja Herce, Demian Inostroza Améstica, Andreas Scherbakov, Eduard H. Hovy, and Ekaterina Vylomova. 2025. Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection. In Proceedings of the 18th Workshop on Building and Using Comparable Corpora (BUCC), pages 62–72, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection (Mahmudi et al., BUCC 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.bucc-1.8.pdf