The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources

Georg Rehm


Abstract
Language Resources (LRs) are an essential ingredient of current approaches in Linguistics, Computational Linguistics, Language Technology and related fields. LRs are collections of spoken or written language data, typically annotated with linguistic analysis information. Different types of LRs exist, for example, corpora, ontologies, lexicons, collections of spoken language data (audio), or collections that also include video (multimedia, multimodal). Often, LRs are distributed with specific tools, documentation, manuals or research publications. The different phases that involve creating and distributing an LR can be conceptualised as a life cycle. While the idea of handling the LR production and maintenance process in terms of a life cycle has been brought up quite some time ago, a best practice model or common approach can still be considered a research gap. This article wants to help fill this gap by proposing an initial version of a generic Language Resource Life Cycle that can be used to inform, direct, control and evaluate LR research and development activities (including description, management, production, validation and evaluation workflows).
Anthology ID:
L16-1388
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2450–2454
Language:
URL:
https://aclanthology.org/L16-1388
DOI:
Bibkey:
Cite (ACL):
Georg Rehm. 2016. The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2450–2454, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources (Rehm, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1388.pdf