V. M. Sánchez-Cartagena
2012
Choosing the correct paradigm for unknown words in rule-based machine translation systems
V. M. Sánchez-Cartagena
|
M. Esplà-Gomis
|
F. Sánchez-Martínez
|
J. A. Pérez-Ortiz
Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation
Previous work on an interactive system aimed at helping non-expert users to enlarge the monolingual dictionaries of rule-based machine translation (MT) systems worked by discarding those inflection paradigms that cannot generate a set of inflected word forms validated by the user. This method, however, cannot deal with the common case where a set of different paradigms generate exactly the same set of inflected word forms, although with different inflection information attached. In this paper, we propose the use of an n-gram-based model of lexical categories and inflection information to select a single paradigm in cases where more than one paradigm generates the same set of word forms. Results obtained with a Spanish monolingual dictionary show that the correct paradigm is chosen for around 75% of the unknown words, thus making the resulting system (available under an open-source license) of valuable help to enlarge the monolingual dictionaries used in MT involving non-expert users without technical linguistic knowledge.
An open-source toolkit for integrating shallow-transfer rules into phrase-based statistical machine translation
V. M. Sánchez-Cartagena
|
F. Sánchez-Martínez
|
J. A. Pérez-Ortiz
Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation
In this paper, we present an open-source toolkit to enrich a phrase-based statistical machine translation system (Moses) with phrase pairs generated from the linguistic resources of a shallow-transfer rule-based machine translation system (Apertium). A system built with this toolkit was not outperformed by any other participant in the shared translation task of the Sixth Workshop on Statistical Machine Translation (WMT 11) for the Spanish–English language pair.