Paul Procter


The Cambridge language survey
Paul Procter
Third International EAMT Workshop: Machine Translation and the Lexicon

The Cambridge Language Survey is a research project whose activities centre around the use of an Integrated Language Database, whereby a computerised dictionary is used for intelligent cross-reference during corpus analysis - searching for example for all the inflections of a verb rather than just the base form. Types of grammatical coding and semantic categorisation appropriate to such a computerised dictionary are discussed, as are software tools for parsing, finding collocations, and performing sense-tagging. The weighted evaluation of semantic, grammatical, and collocational information to discriminate between word senses is described in some detail. Mention is made of several branches of research including the development of parallel corpora, semantic interpretation by sense-tagging, and the use of a Learner Corpus for the analysis of errors made by non-native-speakers. Sense-tagging is identified as an under-exploited approach to language analysis and one for which great opportunities for product development exist.