On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market

David Lewis, Alexander O’Connor, Andrzej Zydroń, Gerd Sjögren, Rahzeb Choudhury


Abstract
Innovations in localisation have focused on the collection and leverage of language resources. However, smaller localisation clients and Language Service Providers are poorly positioned to exploit the benefits of language resource reuse in comparison to larger companies. Their low throughput of localised content means they have little opportunity to amass significant resources, such as Translation memories and Terminology databases, to reuse between jobs or to train statistical machine translation engines tailored to their domain specialisms and language pairs. We propose addressing this disadvantage via the sharing and pooling of language resources. However, the current localisation standards do not support multiparty sharing, are not well integrated with emerging language resource standards and do not address key requirements in determining ownership and license terms for resources. We survey standards and research in the area of Localisation, Language Resources and Language Technologies to leverage existing localisation standards via Linked Data methodologies. This points to the potential of using semantic representation of existing data models for localisation workflow metadata, terminology, parallel text, provenance and access control, which we illustrate with an RDF example.
Anthology ID:
L12-1366
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1403–1409
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/636_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
David Lewis, Alexander O’Connor, Andrzej Zydroń, Gerd Sjögren, and Rahzeb Choudhury. 2012. On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1403–1409, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market (Lewis et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/636_Paper.pdf