The Semantic Atlas: an Interactive Model of Lexical Representation
Sabine Ploux | Armelle Boussidan | Hyungsuk Ji
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper we describe two geometrical models of meaning representation, the Semantic Atlas (SA) and the Automatic Contexonym Organizing Model (ACOM). The SA provides maps of meaning generated through correspondence factor analysis. The models can handle different types of word relations: synonymy in the SA and co-occurrence in ACOM. Their originality relies on an artifact called 'cliques' - a fine grained infra linguistic sub-unit of meaning. The SA is composed of several dictionaries and thesauri enhanced with a process of symmetrisation. It is currently available for French and English in monolingual versions as well as in a bilingual translation version. Other languages are under development and testing. ACOM deals with unannotated corpora. The models are used by research teams worldwide that investigate synonymy, translation processes, genre comparison, psycholinguistics and polysemy modeling. Both models can be consulted online via a flexible interface allowing for interactive navigation on http://dico.isc.cnrs.fr. This site is the most consulted address of the French National Center for Scientific Researchs domain (CNRS), one of the major research bodies in France. The international interest it has triggered led us to initiate the process of going open source. In the meantime, all our databases are freely available on request.
Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary vs. write an article) are difficult to compile using a traditional lexicographic approach. As an alternative, we present a model that reflects this kind of subtle lexical knowledge. Based on the minimal sense of a word (clique), the model (1) selects contextually related words (contexonyms) and (2) classifies them in a multi-dimensional semantic space. Trained on very large corpora, the model provides relevant, organized contexonyms that reflect the fine-grained connotations and contextual usage of the target word, as well as the distinct senses of homonyms and polysemous words. Further study on the neighbor effect showed that the model can handle the data sparseness problem.