DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment

Bonnie Dorr, Lisa Pearl, Rebecca Hwa, Nizar Habash


Abstract
The frequent occurrence of divergenceS—structural differences between languages—presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.
Anthology ID:
2002.amta-papers.4
Volume:
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 8-12
Year:
2002
Address:
Tiburon, USA
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
31–43
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-45820-4_4
DOI:
Bibkey:
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-45820-4_4