Matthieu Doutreligne


2018

pdf bib
Evaluation of a Sequence Tagging Tool for Biomedical Texts
Julien Tourille | Matthieu Doutreligne | Olivier Ferret | Aurélie Névéol | Nicolas Paris | Xavier Tannier
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

Many applications in biomedical natural language processing rely on sequence tagging as an initial step to perform more complex analysis. To support text analysis in the biomedical domain, we introduce Yet Another SEquence Tagger (YASET), an open-source multi purpose sequence tagger that implements state-of-the-art deep learning algorithms for sequence tagging. Herein, we evaluate YASET on part-of-speech tagging and named entity recognition in a variety of text genres including articles from the biomedical literature in English and clinical narratives in French. To further characterize performance, we report distributions over 30 runs and different sizes of training datasets. YASET provides state-of-the-art performance on the CoNLL 2003 NER dataset (F1=0.87), MEDPOST corpus (F1=0.97), MERLoT corpus (F1=0.99) and NCBI disease corpus (F1=0.81). We believe that YASET is a versatile and efficient tool that can be used for sequence tagging in biomedical and clinical texts.