A method to reduce ambiguity at the level of word tagging, on the basis of local syntactic constraints, is described. Such “short context” constraints are easy to process and can remove most of the ambiguity at that level, which is otherwise a source of great difficulty for parsers and other applications in certain natural languages. The use of local constraints is also very effective for quick invalidation of a large set of ill-formed inputs. While in some approaches local constraints are defined manually or discovered by processing of large corpora, we extract them directly from a grammar (typically context free) of the given language. We focus on deterministic constraints, but later extend the method for a probabilistic language model.
IBM is engaged in advanced research and development projects on various aspects of machine translation, between several language pairs. The activities reported on hero are all parts of a rather large-scale, international effort, following Michael McCord’s LMT approach. The paper focuses on seven selected topics: recent enhancements made in the Slot Grammar formalism and the specific analysis components; specification of a semantic type hierarchy and its use for verb sense disambiguation; incorporation of statistical techniques in the translation process; anaphora resolution; linkage of target morphology modules; methods for the construction of large MT lexicons; and interactive disambiguation.