This paper describes the use of decision trees to learn lexical information for the enrichment of our natural language processing (NLP) system. Our approach to lexical learning differs from other approaches in the field in that our machine learning techniques exploit a deep knowledge understanding system. After the introduction we present the overall architecture of our lexical learning module. In the following sections we present a showcase of lexical learning using decision trees: we learn verbs that take a human subject in Spanish and French.
We describe the implementation of two new language pairs (English-French and English-German) which use machine-learned sentence realization components instead of hand-written generation components. The resulting systems are evaluated by human evaluators, and in the technical domain, are equal to the quality of highly respected commercial systems. We comment on the difficulties that are encountered when using machine-learned sentence realization in the context of MT.
Past research has shown that the ideal MT system should be modular and devoid of language pair specific information in its design. We describe here the assembly of TAMTAM (Traduction Automatique Microsoft), the French-English research MT system under development at Microsoft, which was constructed from a combination of pre-existing rule-based components and automatically created components. At this stage, the system has not been adapted either computationally or linguistically to the French-English context and yet it performs only slightly below the French-English Systran system in independent blind human evaluations