Bringing Together Version Control and Quality Assurance of Language Data with LAMA

Aleksandr Riaposov, Elena Lazarenko, Timm Lehmberg


Abstract
This contribution reports on work in process on project specific software and digital infrastructure components used along with corpus curation workflows in the the framework of the long-term language documentation project INEL. By bringing together scientists with different levels of technical affinity in a highly interdisciplinary working environment, the project is confronted with numerous workflow related issues. Many of them result from collaborative (remote-)work on digital corpora, which, among other things, include annotation, glossing but also quality- and consistency control. In this context several steps were taken to bridge the gap between usability and the requirements of complex data curation workflows. Components of the latter such as a versioning system and semi-automated data validators on one side meet the user demands for the simplicity and minimalism on the other side. Embodying a simple shell script in an interactive graphic user interface, we augment the efficacy of the data versioning and the integration of Java-based quality control and validation tools.
Anthology ID:
2022.eurali-1.6
Volume:
Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Atul Kr. Ojha, Sina Ahmadi, Chao-Hong Liu, John P. McCrae
Venue:
EURALI
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
36–41
Language:
URL:
https://aclanthology.org/2022.eurali-1.6
DOI:
Bibkey:
Cite (ACL):
Aleksandr Riaposov, Elena Lazarenko, and Timm Lehmberg. 2022. Bringing Together Version Control and Quality Assurance of Language Data with LAMA. In Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference, pages 36–41, Marseille, France. European Language Resources Association.
Cite (Informal):
Bringing Together Version Control and Quality Assurance of Language Data with LAMA (Riaposov et al., EURALI 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.eurali-1.6.pdf