Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Mitja Nikolaus; Abhishek Agrawal; Petros Kaklamanis; Alex Warstadt; Abdellah Fourtassi

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Mitja Nikolaus, Abhishek Agrawal, Petros Kaklamanis, Alex Warstadt, Abdellah Fourtassi

Abstract

The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools for automatic annotation can offer an effective alternative to tedious manual annotation. We propose a coding scheme for context-dependent grammaticality in child-caregiver conversations and annotate more than 4,000 utterances from a large corpus of transcribed conversations. Based on these annotations, we train and evaluate a range of NLP models. Our results show that fine-tuned Transformer-based models perform best, achieving human inter-annotation agreement levels. As a first application and sanity check of this tool, we use the trained models to annotate a corpus almost two orders of magnitude larger than the manually annotated data and verify that children’s grammaticality shows a steady increase with age. This work contributes to the growing literature on applying state-of-the-art NLP methods to help study child language acquisition at scale.

Anthology ID:: 2024.lrec-main.164
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 1832–1844
Language:
URL:: https://aclanthology.org/2024.lrec-main.164
DOI:
Bibkey:
Cite (ACL):: Mitja Nikolaus, Abhishek Agrawal, Petros Kaklamanis, Alex Warstadt, and Abdellah Fourtassi. 2024. Automatic Annotation of Grammaticality in Child-Caregiver Conversations. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1832–1844, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Automatic Annotation of Grammaticality in Child-Caregiver Conversations (Nikolaus et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.164.pdf

PDF Cite Search