BERT for Long Documents: A Case Study of Automated ICD Coding

Arash Afkanpour; Shabir Adeel; Hansenclever Bassani; Arkady Epshteyn; Hongbo Fan; Isaac Jones; Mahan Malihi; Adrian Nauth; Raj Sinha; Sanjana Woonna; Shiva Zamani; Elli Kanal; Mikhail Fomitchev; Donny Cheung

doi:10.18653/v1/2022.louhi-1.12

BERT for Long Documents: A Case Study of Automated ICD Coding

Arash Afkanpour, Shabir Adeel, Hansenclever Bassani, Arkady Epshteyn, Hongbo Fan, Isaac Jones, Mahan Malihi, Adrian Nauth, Raj Sinha, Sanjana Woonna, Shiva Zamani, Elli Kanal, Mikhail Fomitchev, Donny Cheung

Abstract

Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods.

Anthology ID:: 2022.louhi-1.12
Volume:: Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Alberto Lavelli, Eben Holderness, Antonio Jimeno Yepes, Anne-Lyse Minard, James Pustejovsky, Fabio Rinaldi
Venue:: Louhi
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 100–107
Language:
URL:: https://aclanthology.org/2022.louhi-1.12/
DOI:: 10.18653/v1/2022.louhi-1.12
Bibkey:
Cite (ACL):: Arash Afkanpour, Shabir Adeel, Hansenclever Bassani, Arkady Epshteyn, Hongbo Fan, Isaac Jones, Mahan Malihi, Adrian Nauth, Raj Sinha, Sanjana Woonna, Shiva Zamani, Elli Kanal, Mikhail Fomitchev, and Donny Cheung. 2022. BERT for Long Documents: A Case Study of Automated ICD Coding. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 100–107, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: BERT for Long Documents: A Case Study of Automated ICD Coding (Afkanpour et al., Louhi 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.louhi-1.12.pdf
Video:: https://aclanthology.org/2022.louhi-1.12.mp4

PDF Cite Search Video Fix data