CXR Data Annotation and Classification with Pre-trained Language Models

Nina Zhou; Aiti Aw; Zhuo Han Liu; Cher heng Tan; Yonghan Ting; Wen Xiang Chen; Jordan sim zheng Ting

CXR Data Annotation and Classification with Pre-trained Language Models

Nina Zhou, Ai Ti Aw, Zhuo Han Liu, Cher heng Tan, Yonghan Ting, Wen Xiang Chen, Jordan sim zheng Ting

Abstract

Clinical data annotation has been one of the major obstacles for applying machine learning approaches in clinical NLP. Open-source tools such as NegBio and CheXpert are usually designed on data from specific institutions, which limit their applications to other institutions due to the differences in writing style, structure, language use as well as label definition. In this paper, we propose a new weak supervision annotation framework with two improvements compared to existing annotation frameworks: 1) we propose to select representative samples for efficient manual annotation; 2) we propose to auto-annotate the remaining samples, both leveraging on a self-trained sentence encoder. This framework also provides a function for identifying inconsistent annotation errors. The utility of our proposed weak supervision annotation framework is applicable to any given data annotation task, and it provides an efficient form of sample selection and data auto-annotation with better classification results for real applications.

Anthology ID:: 2022.coling-1.247
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 2801–2811
Language:
URL:: https://aclanthology.org/2022.coling-1.247/
DOI:
Bibkey:
Cite (ACL):: Nina Zhou, Ai Ti Aw, Zhuo Han Liu, Cher heng Tan, Yonghan Ting, Wen Xiang Chen, and Jordan sim zheng Ting. 2022. CXR Data Annotation and Classification with Pre-trained Language Models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2801–2811, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: CXR Data Annotation and Classification with Pre-trained Language Models (Zhou et al., COLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.coling-1.247.pdf

PDF Cite Search Fix data