On the annotation of TMX translation memories for advanced leveraging in computer-aided translation

Mikel L. Forcada

On the annotation of TMX translation memories for advanced leveraging in computer-aided translation

Abstract

The term advanced leveraging refers to extensions beyond the current usage of translation memory (TM) in computer-aided translation (CAT). One of these extensions is the ability to identify and use matches on the sub-segment level ― for instance, using sub-sentential elements when segments are sentences― to help the translator when a reasonable fuzzy-matched proposal is not available; some such functionalities have started to become available in commercial CAT tools. Resources such as statistical word aligners, external machine translation systems, glossaries and term bases could be used to identify and annotate segment-level translation units at the sub-segment level, but there is currently no single, agreed standard supporting the interchange of sub-segmental annotation of translation memories to create a richer translation resource. This paper discusses the capabilities and limitations of some current standards, envisages possible alternatives, and ends with a tentative proposal which slightly abuses (repurposes) the usage of existing elements in the TMX standard.

Anthology ID:: L14-1321
Volume:: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:: May
Year:: 2014
Address:: Reykjavik, Iceland
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 4374–4378
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/373_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Mikel Forcada. 2014. On the annotation of TMX translation memories for advanced leveraging in computer-aided translation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4374–4378, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):: On the annotation of TMX translation memories for advanced leveraging in computer-aided translation (Forcada, LREC 2014)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/373_Paper.pdf

PDF Cite Search Fix data