Shallow Parsing for Nepal Bhasa Complement Clauses

Borui Zhang, Abe Kazemzadeh, Brian Reese


Abstract
Accelerating the process of data collection, annotation, and analysis is an urgent need for linguistic fieldwork and documentation of endangered languages (Bird, 2009). Our experiments describe how we maximize the quality for the Nepal Bhasa syntactic complement structure chunking model. Native speaker language consultants were trained to annotate a minimally selected raw data set (Suárez et al.,2019). The embedded clauses, matrix verbs, and embedded verbs are annotated. We apply both statistical training algorithms and transfer learning in our training, including Naive Bayes, MaxEnt, and fine-tuning the pre-trained mBERT model (Devlin et al., 2018). We show that with limited annotated data, the model is already sufficient for the task. The modeling resources we used are largely available for many other endangered languages. The practice is easy to duplicate for training a shallow parser for other endangered languages in general.
Anthology ID:
2022.computel-1.8
Volume:
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venues:
ACL | ComputEL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
61–67
Language:
URL:
https://aclanthology.org/2022.computel-1.8
DOI:
10.18653/v1/2022.computel-1.8
Bibkey:
Cite (ACL):
Borui Zhang, Abe Kazemzadeh, and Brian Reese. 2022. Shallow Parsing for Nepal Bhasa Complement Clauses. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 61–67, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Shallow Parsing for Nepal Bhasa Complement Clauses (Zhang et al., ComputEL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.computel-1.8.pdf