Seeing the wood for the trees: data-oriented translation

Mary Hearne, Andy Way


Abstract
Data-Oriented Translation (DOT), which is based on Data-Oriented Parsing (DOP), comprises an experience-based approach to translation, where new translations are derived with reference to grammatical analyses of previous translations. Previous DOT experiments [Poutsma, 1998, Poutsma, 2000a, Poutsma, 2000b] were small in scale because important advances in DOP technology were not incorporated into the translation model. Despite this, related work [Way, 1999, Way, 2003a, Way, 2003b] reports that DOT models are viable in that solutions to ‘hard’ translation cases are readily available. However, it has not been shown to date that DOT models scale to larger datasets. In this work, we describe a novel DOT system, inspired by recent advances in DOP parsing technology. We test our system on larger, more complex corpora than have been used heretofore, and present both automatic and human evaluations which show that high quality translations can be achieved at reasonable speeds.
Anthology ID:
2003.mtsummit-papers.22
Volume:
Proceedings of Machine Translation Summit IX: Papers
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-papers.22
DOI:
Bibkey:
Cite (ACL):
Mary Hearne and Andy Way. 2003. Seeing the wood for the trees: data-oriented translation. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):
Seeing the wood for the trees: data-oriented translation (Hearne & Way, MTSummit 2003)
Copy Citation:
PDF:
https://aclanthology.org/2003.mtsummit-papers.22.pdf