Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

Géraldine Walther, Benoît Sagot


Abstract
In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are meant to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child-speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.
Anthology ID:
W17-2212
Volume:
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCH
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
89–94
Language:
URL:
https://aclanthology.org/W17-2212
DOI:
10.18653/v1/W17-2212
Bibkey:
Cite (ACL):
Géraldine Walther and Benoît Sagot. 2017. Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 89–94, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin (Walther & Sagot, LaTeCH 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2212.pdf