Adaptive bilingual sentence alignment

Thomas C. Chuang, G.N. You, Jason Chang


Abstract
We present a new approach to the problem of aligning English and Chinese sentences in a bilingual corpus based on adaptive learning. While using length information alone produces surprisingly good results for aligning bilingual French and English sentences with success rates well over 95%, it does not fair as well for the alignment of English and Chinese sentences. The crux of the problem lies in greater variability of lengths and match types of the matched sentences. We propose to cope with such variability via a two-pass scheme under which model parameters can be learned from the data at hand. Experiments show that under the approach bilingual English-Chinese texts can be aligned effectively across diverse domains, genres and translation directions with accuracy rates approaching 99%.
Anthology ID:
2002.amta-papers.3
Volume:
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 8-12
Year:
2002
Address:
Tiburon, USA
Editor:
Stephen D. Richardson
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
21–30
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-45820-4_3
DOI:
Bibkey:
Cite (ACL):
Thomas C. Chuang, G.N. You, and Jason Chang. 2002. Adaptive bilingual sentence alignment. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 21–30, Tiburon, USA. Springer.
Cite (Informal):
Adaptive bilingual sentence alignment (Chuang et al., AMTA 2002)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-45820-4_3