Using word formation rules to extend MT lexicons

Claudia Gdaniec, Esmé Manandise


Abstract
In the IBM LMT Machine Translation (MT) system, a built-in strategy provides lexical coverage of a particular subset of words that are not listed in its bilingual lexicons. The recognition and coding of these words and their transfer generation is based on a set of derivational morphological rules. A new utility extends unfound words of this type in an LMT-compatible format in an auxiliary bilingual lexical file to be subsequently merged into the core lexicons. What characterizes this approach is the use of morphological, semantic, and syntactic features for both analysis and transfer. The auxiliary lexical file (ALF) has to be revised before a merge into the core lexicons. This utility integrates a linguistics-based analysis and transfer rules with a corpus-based method of verifying or falsifying linguistic hypotheses against extensive document translation, which in addition yields statistics on frequencies of occurrence as well as local context.
Anthology ID:
2002.amta-papers.7
Volume:
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 8-12
Year:
2002
Address:
Tiburon, USA
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
64–73
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-45820-4_7
DOI:
Bibkey:
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-45820-4_7