Andres Abeliuk
2022
Divide and Conquer: An Extreme Multi-Label Classification Approach for Coding Diseases and Procedures in Spanish
Jose Barros
|
Matias Rojas
|
Jocelyn Dunstan
|
Andres Abeliuk
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Clinical coding is the task of transforming medical documents into structured codes following a standard ontology. Since these terminologies are composed of hundreds of codes, this problem can be considered an Extreme Multi-label Classification task. This paper proposes a novel neural network-based architecture for clinical coding. First, we take full advantage of the hierarchical nature of ontologies to create clusters based on semantic relations. Then, we use a Matcher module to assign the probability of documents belonging to each cluster. Finally, the Ranker calculates the probability of each code considering only the documents in the cluster. This division allows a fine-grained differentiation within the cluster, which cannot be addressed using a single classifier. In addition, since most of the previous work has focused on solving this task in English, we conducted our experiments on three clinical coding corpora in Spanish. The experimental results demonstrate the effectiveness of our model, achieving state-of-the-art results on two of the three datasets. Specifically, we outperformed previous models on two subtasks of the CodiEsp shared task: CodiEsp-D (diseases) and CodiEsp-P (procedures). Automatic coding can profoundly impact healthcare by structuring critical information written in free text in electronic health records.
2021
Detecting Polarized Topics Using Partisanship-aware Contextualized Topic Embeddings
Zihao He
|
Negar Mokhberian
|
António Câmara
|
Andres Abeliuk
|
Kristina Lerman
Findings of the Association for Computational Linguistics: EMNLP 2021
Growing polarization of the news media has been blamed for fanning disagreement, controversy and even violence. Early identification of polarized topics is thus an urgent matter that can help mitigate conflict. However, accurate measurement of topic-wise polarization is still an open research challenge. To address this gap, we propose Partisanship-aware Contextualized Topic Embeddings (PaCTE), a method to automatically detect polarized topics from partisan news sources. Specifically, utilizing a language model that has been finetuned on recognizing partisanship of the news articles, we represent the ideology of a news corpus on a topic by corpus-contextualized topic embedding and measure the polarization using cosine distance. We apply our method to a dataset of news articles about the COVID-19 pandemic. Extensive experiments on different news sources and topics demonstrate the efficacy of our method to capture topical polarization, as indicated by its effectiveness of retrieving the most polarized topics.
Search
Fix data
Co-authors
- Jose Barros 1
- António Câmara 1
- Jocelyn Dunstan 1
- Zihao He 1
- Kristina Lerman 1
- show all...