Providing a Catalogue of Language Resources for Commercial Users

Bente Maegaard, Lina Henriksen, Andrew Joscelyne, Vesna Lusicky, Margaretha Mazura, Sussi Olsen, Claus Povlsen, Philippe Wacker


Abstract
Language resources (LR) are indispensable for the development of tools for machine translation (MT) or various kinds of computer-assisted translation (CAT). In particular language corpora, both parallel and monolingual are considered most important for instance for MT, not only SMT but also hybrid MT. The Language Technology Observatory will provide easy access to information about LRs deemed to be useful for MT and other translation tools through its LR Catalogue. In order to determine what aspects of an LR are useful for MT practitioners, a user study was made, providing a guide to the most relevant metadata and the most relevant quality criteria. We have seen that many resources exist which are useful for MT and similar work, but the majority are for (academic) research or educational use only, and as such not available for commercial use. Our work has revealed a list of gaps: coverage gap, awareness gap, quality gap, quantity gap. The paper ends with recommendations for a forward-looking strategy.
Anthology ID:
L16-1072
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
449–456
Language:
URL:
https://aclanthology.org/L16-1072
DOI:
Bibkey:
Cite (ACL):
Bente Maegaard, Lina Henriksen, Andrew Joscelyne, Vesna Lusicky, Margaretha Mazura, Sussi Olsen, Claus Povlsen, and Philippe Wacker. 2016. Providing a Catalogue of Language Resources for Commercial Users. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 449–456, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Providing a Catalogue of Language Resources for Commercial Users (Maegaard et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1072.pdf