A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

Arul Menezes, Stephen D. Richardson


Abstract
Translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora have been hampered by the difficulty of achieving accurate alignment and acquiring high quality mappings. We describe an algorithm that uses a best-first strategy and a small alignment grammar to significantly improve the quality of the mappings extracted. For each mapping, frequencies are computed and sufficient context is retained to distinguish competing mappings during translation. Variants of the algorithm are run against a corpus containing 200K sentence pairs and evaluated based on the quality of resulting translations.
Anthology ID:
2001.mtsummit-ebmt.4
Volume:
Workshop on Example-Based machine Translation
Month:
September 18-22
Year:
2001
Address:
Santiago de Compostela, Spain
Editors:
Michael Carl, Andy Way
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2001.mtsummit-ebmt.4
DOI:
Bibkey:
Cite (ACL):
Arul Menezes and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Workshop on Example-Based machine Translation, Santiago de Compostela, Spain.
Cite (Informal):
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora (Menezes & Richardson, MTSummit 2001)
Copy Citation:
PDF:
https://aclanthology.org/2001.mtsummit-ebmt.4.pdf