How Well Can BERT Learn the Grammar of an Agglutinative and Flexible-Order Language? The Case of Basque.

Gorka Urbizu, Muitze Zulaika, Xabier Saralegi, Ander Corral


Abstract
This work investigates the acquisition of formal linguistic competence by neural language models, hypothesizing that languages with complex grammar, such as Basque, present substantial challenges during the pre-training phase. Basque is distinguished by its complex morphology and flexible word order, potentially complicating grammar extraction. In our analysis, we evaluated the grammatical knowledge of BERT models trained under various pre-training configurations, considering factors such as corpus size, model size, number of epochs, and the use of lemmatization. To assess this grammatical knowledge, we constructed the BL2MP (Basque L2 student-based Minimal Pairs) test set. This test set consists of minimal pairs, each containing both a grammatically correct and an incorrect sentence, sourced from essays authored by students at different proficiency levels in the Basque language. Additionally, our analysis explores the difficulties in learning various grammatical phenomena, the challenges posed by flexible word order, and the influence of the student’s proficiency level on the difficulty of correcting grammar errors.
Anthology ID:
2024.lrec-main.731
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
8334–8348
Language:
URL:
https://aclanthology.org/2024.lrec-main.731
DOI:
Bibkey:
Cite (ACL):
Gorka Urbizu, Muitze Zulaika, Xabier Saralegi, and Ander Corral. 2024. How Well Can BERT Learn the Grammar of an Agglutinative and Flexible-Order Language? The Case of Basque.. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8334–8348, Torino, Italia. ELRA and ICCL.
Cite (Informal):
How Well Can BERT Learn the Grammar of an Agglutinative and Flexible-Order Language? The Case of Basque. (Urbizu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.731.pdf