Dealing with dialectal variation in the construction of the Basque historical corpus

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze


Abstract
This paper analyses the challenge of working with dialectal variation when semi-automatically normalising and analysing historical Basque texts. This work is part of a more general ongoing project for the construction of a morphosyntactically annotated historical corpus of Basque called Basque in the Making (BIM): A Historical Look at a European Language Isolate, whose main objective is the systematic and diachronic study of a number of grammatical features. This will be not only the first tagged corpus of historical Basque, but also a means to improve language processing tools by analysing historical Basque varieties more or less distant from present-day standard Basque.
Anthology ID:
2020.vardial-1.8
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
79–89
Language:
URL:
https://aclanthology.org/2020.vardial-1.8
DOI:
Bibkey:
Cite (ACL):
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, and Ander Soraluze. 2020. Dealing with dialectal variation in the construction of the Basque historical corpus. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 79–89, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Dealing with dialectal variation in the construction of the Basque historical corpus (Estarrona et al., VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.8.pdf