David D. Lewis

Also published as: David Lewis

2016

Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
David Lewis | Kaniz Fatema | Alfredo Maldonado | Brian Walshe | Arturo Calvo
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

An assessment of the intellectual property requirements for data used in machine-aided translation is provided based on a recent EC-funded legal review. This is compared against the capabilities offered by current linked open data standards from the W3C for publishing and sharing translation memories from translation projects, and proposals for adequately addressing the intellectual property needs of stakeholders in translation projects using open data vocabularies are suggested.

2015

pdf bib

FALCON: Federated Active Linguistic data CuratiON
David Lewis
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib

An empirical study of segment prioritization for incrementally retrained post-editing-based SMT
Jinhua Du | Ankit Srivastava | Andy Way | Alfredo Maldonado-Guerra | David Lewis
Proceedings of Machine Translation Summit XV: Papers

pdf bib

FALCON: Federated Active Linguistic data CuratiON
David Lewis
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib abs

As language resources start to become available in linked data formats, it becomes relevant to consider how linked data interoperability can play a role in active language processing workflows as well as for more static language resource publishing. This paper proposes that linked data may have a valuable role to play in tracking the use and generation of language resources in such workflows in order to assess and improve the performance of the language technologies that use the resources, based on feedback from the human involvement typically required within such processes. We refer to this as Active Curation of the language resources, since it is performed systematically over language processing workflows to continuously improve the quality of the resource in specific applications, rather than via dedicated curation steps. We use modern localisation workflows, i.e. assisted by machine translation and text analytics services, to explain how linked data can support such active curation. By referencing how a suitable linked data vocabulary can be assembled by combining existing linked data vocabularies and meta-data from other multilingual content processing annotations and tool exchange standards we aim to demonstrate the relative ease with which active curation can be deployed more broadly.

2012

pdf bib abs

On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market
David Lewis | Alexander O’Connor | Andrzej Zydroń | Gerd Sjögren | Rahzeb Choudhury
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Innovations in localisation have focused on the collection and leverage of language resources. However, smaller localisation clients and Language Service Providers are poorly positioned to exploit the benefits of language resource reuse in comparison to larger companies. Their low throughput of localised content means they have little opportunity to amass significant resources, such as Translation memories and Terminology databases, to reuse between jobs or to train statistical machine translation engines tailored to their domain specialisms and language pairs. We propose addressing this disadvantage via the sharing and pooling of language resources. However, the current localisation standards do not support multiparty sharing, are not well integrated with emerging language resource standards and do not address key requirements in determining ownership and license terms for resources. We survey standards and research in the area of Localisation, Language Resources and Language Technologies to leverage existing localisation standards via Linked Data methodologies. This points to the potential of using semantic representation of existing data models for localisation workflow metadata, terminology, parallel text, provenance and access control, which we illustrate with an RDF example.