Arnaud Ferré

2020

pdf bib abs
Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information
Arnaud Ferré | Robert Bossy | Mouhamadou Ba | Louise Deléger | Thomas Lavergne | Pierre Zweigenbaum | Claire Nédellec
Proceedings of the Twelfth Language Resources and Evaluation Conference

Entity normalization (or entity linking) is an important subtask of information extraction that links entity mentions in text to categories or concepts in a reference vocabulary. Machine learning based normalization methods have good adaptability as long as they have enough training data per reference with a sufficient quality. Distributional representations are commonly used because of their capacity to handle different expressions with similar meanings. However, in specific technical and scientific domains, the small amount of training data and the relatively small size of specialized corpora remain major challenges. Recently, the machine learning-based CONTES method has addressed these challenges for reference vocabularies that are ontologies, as is often the case in life sciences and biomedical domains. And yet, its performance is dependent on manually annotated corpus. Furthermore, like other machine learning based methods, parametrization remains tricky. We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples. We also study which hyperparameters are most influential, with sometimes different patterns compared to previous work. The results show that our approach significantly improves accuracy and outperforms previous state-of-the-art algorithms.

2018

pdf bib
Combining rule-based and embedding-based approaches to normalize textual entities with an ontology
Arnaud Ferré | Louise Deléger | Pierre Zweigenbaum | Claire Nédellec
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
Representation of complex terms in a vector space structured by an ontology for a normalization task
Arnaud Ferré | Pierre Zweigenbaum | Claire Nédellec
BioNLP 2017

We propose in this paper a semi-supervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a distance calculation to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information containing in texts, but the vector space generated might find other applications. The performance of this method is comparable to that of the state of the art for this task of standardization, opening up encouraging prospects.

pdf bib abs
Normalisation de termes complexes par sémantique distributionnelle guidée par une ontologie (Normalization of complex terms with distributional semantics guided by an ontology)
Arnaud Ferré
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. 19es REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL 2017)

Nous proposons dans cet article une méthode semi-supervisée originale pour la création de représentations vectorielles pour des termes (complexes ou non) dans un espace sémantique pertinent pour une tâche de normalisation de termes désignant des entités dans un corpus. Notre méthode s’appuie en partie sur une approche de sémantique distributionnelle, celle-ci générant des vecteurs initiaux pour chacun des termes extraits. Ces vecteurs sont alors plongés dans un autre espace vectoriel construit à partir de la structure d’une ontologie. Pour la construction de ce second espace vectoriel ontologique, plusieurs méthodes sont testées et comparées. Le plongement s’effectue par entraînement d’un modèle linéaire. Un calcul de distance (en utilisant la similarité cosinus) est enfin effectué pour déterminer la proximité entre vecteurs de termes et vecteurs de concepts de l’ontologie servant à la normalisation. La performance de cette méthode a atteint un rang honorable, ouvrant d’encourageantes perspectives.

2016

Co-authors

Estelle Chaix 1

Philippe Bessières 1

Thomas Lavergne 1