Representation of complex terms in a vector space structured by an ontology for a normalization task

Arnaud Ferré, Pierre Zweigenbaum, Claire Nédellec


Abstract
We propose in this paper a semi-supervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a distance calculation to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information containing in texts, but the vector space generated might find other applications. The performance of this method is comparable to that of the state of the art for this task of standardization, opening up encouraging prospects.
Anthology ID:
W17-2312
Volume:
BioNLP 2017
Month:
August
Year:
2017
Address:
Vancouver, Canada,
Venues:
BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–106
Language:
URL:
https://aclanthology.org/W17-2312
DOI:
10.18653/v1/W17-2312
Bibkey:
Cite (ACL):
Arnaud Ferré, Pierre Zweigenbaum, and Claire Nédellec. 2017. Representation of complex terms in a vector space structured by an ontology for a normalization task. In BioNLP 2017, pages 99–106, Vancouver, Canada,. Association for Computational Linguistics.
Cite (Informal):
Representation of complex terms in a vector space structured by an ontology for a normalization task (Ferré et al., 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2312.pdf