Acquisition of bilingual MT lexicons from OCRed dictionaries

Burcu Karagol-Ayan, David Doermann, Bonnie J. Dorr


Abstract
This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based, an HMM-based, and a post-processed HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better with dictionaries where the font is not an important distinguishing feature for determining information types; (2) the post-processed stochastic method improves the results of the stochastic method for phrasal entries; and (3) Our resulting bilingual lexicons are comprehensive enough to provide the basis for reasonable translation results when compared to human translations.
Anthology ID:
2003.mtsummit-papers.28
Volume:
Proceedings of Machine Translation Summit IX: Papers
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-papers.28
DOI:
Bibkey:
Cite (ACL):
Burcu Karagol-Ayan, David Doermann, and Bonnie J. Dorr. 2003. Acquisition of bilingual MT lexicons from OCRed dictionaries. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):
Acquisition of bilingual MT lexicons from OCRed dictionaries (Karagol-Ayan et al., MTSummit 2003)
Copy Citation:
PDF:
https://aclanthology.org/2003.mtsummit-papers.28.pdf