Improving domain-specific word alignment with a general bilingual corpus

Hua Wu, Haifeng Wang


Abstract
In conventional word alignment methods, some employ statistical models or statistical measures, which need large-scale bilingual sentence-aligned training corpora. Others employ dictionaries to guide alignment selection. However, these methods achieve unsatisfactory alignment results when performing word alignment on a small-scale domain-specific bilingual corpus without terminological lexicons. This paper proposes an approach to improve word alignment in a specific domain, in which only a small-scale domain-specific corpus is available, by adapting the word alignment information in the general domain to the specific domain. This approach first trains two statistical word alignment models with the large-scale corpus in the general domain and the small-scale corpus in the specific domain respectively, and then improves the domain-specific word alignment with these two models. Experimental results show a significant improvement in terms of both alignment precision and recall, achieving a relative error rate reduction of 21.96% as compared with state-of-the-art technologies.
Anthology ID:
2004.amta-papers.29
Volume:
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
September 28 - October 2
Year:
2004
Address:
Washington, USA
Editors:
Robert E. Frederking, Kathryn B. Taylor
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
262–271
Language:
URL:
https://link.springer.com/chapter/10.1007/978-3-540-30194-3_29
DOI:
Bibkey:
Cite (ACL):
Hua Wu and Haifeng Wang. 2004. Improving domain-specific word alignment with a general bilingual corpus. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 262–271, Washington, USA. Springer.
Cite (Informal):
Improving domain-specific word alignment with a general bilingual corpus (Wu & Wang, AMTA 2004)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/978-3-540-30194-3_29