2008
pdf
bib
abs
Comparing Constituency and Dependency Representations for SMT Phrase-Extraction
Mary Hearne
|
Sylwia Ozdowska
|
John Tinsley
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
We consider the value of replacing and/or combining string-basedmethods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT.
2007
pdf
bib
Robust language pair-independent sub-tree alignment
John Tinsley
|
Ventsislav Zhechev
|
Mary Hearne
|
Andy Way
Proceedings of Machine Translation Summit XI: Papers
pdf
bib
Capturing translational divergences with a statistical tree-to-tree aligner
Mary Hearne
|
John Tinsley
|
Ventsislav Zhechev
|
Andy Way
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers
2006
pdf
bib
Disambiguation Strategies for Data-Oriented Translation
Mary Hearne
|
Andy Way
Proceedings of the 11th Annual Conference of the European Association for Machine Translation
2004
pdf
bib
Robust Sub-Sentential Alignment of Phrase-Structure Trees
Declan Groves
|
Mary Hearne
|
Andy Way
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics
2003
pdf
bib
abs
Seeing the wood for the trees: data-oriented translation
Mary Hearne
|
Andy Way
Proceedings of Machine Translation Summit IX: Papers
Data-Oriented Translation (DOT), which is based on Data-Oriented Parsing (DOP), comprises an experience-based approach to translation, where new translations are derived with reference to grammatical analyses of previous translations. Previous DOT experiments [Poutsma, 1998, Poutsma, 2000a, Poutsma, 2000b] were small in scale because important advances in DOP technology were not incorporated into the translation model. Despite this, related work [Way, 1999, Way, 2003a, Way, 2003b] reports that DOT models are viable in that solutions to ‘hard’ translation cases are readily available. However, it has not been shown to date that DOT models scale to larger datasets. In this work, we describe a novel DOT system, inspired by recent advances in DOP parsing technology. We test our system on larger, more complex corpora than have been used heretofore, and present both automatic and human evaluations which show that high quality translations can be achieved at reasonable speeds.
2002
pdf
bib
abs
Example-based machine translation via the Web
Nano Gough
|
Andy Way
|
Mary Hearne
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
One of the limitations of translation memory systems is that the smallest translation units currently accessible are aligned sentential pairs. We propose an example-based machine translation system which uses a ‘phrasal lexicon’ in addition to the aligned sentences in its database. These phrases are extracted from the Penn Treebank using the Marker Hypothesis as a constraint on segmentation. They are then translated by three on-line machine translation (MT) systems, and a number of linguistic resources are automatically constructed which are used in the translation of new input. We perform two experiments on testsets of sentences and noun phrases to demonstrate the effectiveness of our system. In so doing, we obtain insights into the strengths and weaknesses of the selected on-line MT systems. Finally, like many example-based machine translation systems, our approach also suffers from the problem of ‘boundary friction’. Where the quality of resulting translations is compromised as a result, we use a novel, post hoc validation procedure via the World Wide Web to correct imperfect translations prior to their being output to the user.