Jose Barros
2022
Divide and Conquer: An Extreme Multi-Label Classification Approach for Coding Diseases and Procedures in Spanish
Jose Barros
|
Matias Rojas
|
Jocelyn Dunstan
|
Andres Abeliuk
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Clinical coding is the task of transforming medical documents into structured codes following a standard ontology. Since these terminologies are composed of hundreds of codes, this problem can be considered an Extreme Multi-label Classification task. This paper proposes a novel neural network-based architecture for clinical coding. First, we take full advantage of the hierarchical nature of ontologies to create clusters based on semantic relations. Then, we use a Matcher module to assign the probability of documents belonging to each cluster. Finally, the Ranker calculates the probability of each code considering only the documents in the cluster. This division allows a fine-grained differentiation within the cluster, which cannot be addressed using a single classifier. In addition, since most of the previous work has focused on solving this task in English, we conducted our experiments on three clinical coding corpora in Spanish. The experimental results demonstrate the effectiveness of our model, achieving state-of-the-art results on two of the three datasets. Specifically, we outperformed previous models on two subtasks of the CodiEsp shared task: CodiEsp-D (diseases) and CodiEsp-P (procedures). Automatic coding can profoundly impact healthcare by structuring critical information written in free text in electronic health records.
PLN CMM at SocialDisNER: Improving Detection of Disease Mentions in Tweets by Using Document-Level Features
Matias Rojas
|
Jose Barros
|
Kinan Martin
|
Mauricio Araneda-Hernandez
|
Jocelyn Dunstan
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
This paper describes our approaches used to solve the SocialDisNER task, which belongs to the Social Media Mining for Health Applications (SMM4H) shared task. This task aims to identify disease mentions in tweets written in Spanish. The proposed model is an architecture based on the FLERT approach. It consists of fine-tuning a language model that creates an input representation of a sentence based on its neighboring sentences, thus obtaining the document-level context. The best result was obtained using an ensemble of six language models using the FLERT approach. The system achieved an F1 score of 0.862, significantly surpassing the average performance among competitor models of 0.680 on the test partition.