A self-learning method of parallel texts alignment

António Ribeiro, Gabriel Lopes, João Mexia


Abstract
This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.
Anthology ID:
2000.amta-papers.4
Volume:
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 10-14
Year:
2000
Address:
Cuernavaca, Mexico
Editor:
John S. White
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
30–39
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-39965-8_4
DOI:
Bibkey:
Cite (ACL):
António Ribeiro, Gabriel Lopes, and João Mexia. 2000. A self-learning method of parallel texts alignment. In Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 30–39, Cuernavaca, Mexico. Springer.
Cite (Informal):
A self-learning method of parallel texts alignment (Ribeiro et al., AMTA 2000)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-39965-8_4