One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text

Victor Bellon, Raul Rodriguez-Esteban


Abstract
We explored a new approach to named entity recognition based on hundreds of machine learning models, each trained to distinguish a single entity, and showed its application to gene name identification (GNI). The rationale for our approach, which we named “one model per entity” (OMPE), was that increasing the number of models would make the learning task easier for each individual model. Our training strategy leveraged freely-available database annotations instead of manually-annotated corpora. While its performance in our proof-of-concept was disappointing, we believe that there is enough room for improvement that such approaches could reach competitive performance while eliminating the cost of creating costly training corpora.
Anthology ID:
W17-8007
Volume:
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Svetla Boytcheva, Kevin Bretonnel Cohen, Guergana Savova, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
49–54
Language:
URL:
https://doi.org/10.26615/978-954-452-044-1_007
DOI:
10.26615/978-954-452-044-1_007
Bibkey:
Cite (ACL):
Victor Bellon and Raul Rodriguez-Esteban. 2017. One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text. In Proceedings of the Biomedical NLP Workshop associated with RANLP 2017, pages 49–54, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text (Bellon & Rodriguez-Esteban, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-044-1_007