L&H lexicography toolkit for machine translation

Timothy Meekhof, David Clements


Abstract
One of the most important components of any machine translation system is the translation lexicon. The size and quality of the lexicon, as well as the coverage of the lexicon for a particular use, greatly influence the applicability of machine translation for a user. The high cost of lexicon development limits the extent to which even mature machine translation vendors can expand and specialize their lexicons, and frequently prevents users from building extensive lexicons at all. To address the high cost of lexicography for machine translation, L&H is building a Lexicography Toolkit that includes tools that can significantly improve the process of creating custom lexicons. The toolkit is based on the concept of using automatic methods of data acquisition, using text corpora, to generate lexicon entries. Of course, lexicon entries must be accurate, so the work of the toolkit must be checked by human experts at several stages. However, this checking mostly consists of removing erroneous results, rather than adding data and entire entries. This article will explore how the Lexicography Toolkit would be used to create a lexicon that is specific to the user’s domain.
Anthology ID:
2000.amta-systems.6
Volume:
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: System Descriptions
Month:
October 10-14
Year:
2000
Address:
Cuernavaca, Mexico
Editor:
John S. White
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
213–218
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-39965-8_24
DOI:
Bibkey:
Cite (ACL):
Timothy Meekhof and David Clements. 2000. L&H lexicography toolkit for machine translation. In Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: System Descriptions, pages 213–218, Cuernavaca, Mexico. Springer.
Cite (Informal):
L&H lexicography toolkit for machine translation (Meekhof & Clements, AMTA 2000)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-39965-8_24