RNN Embeddings for Identifying Difficult to Understand Medical Words

Hanna Pylieva, Artem Chernodub, Natalia Grabar, Thierry Hamon


Abstract
Patients and their families often require a better understanding of medical information provided by doctors. We currently address this issue by improving the identification of difficult to understand medical words. We introduce novel embeddings received from RNN - FrnnMUTE (French RNN Medical Understandability Text Embeddings) which allow to reach up to 87.0 F1 score in identification of difficult words. We also note that adding pre-trained FastText word embeddings to the feature set substantially improves the performance of the model which classifies words according to their difficulty. We study the generalizability of different models through three cross-validation scenarios which allow testing classifiers in real-world conditions: understanding of medical words by new users, and classification of new unseen words by the automatic models. The RNN - FrnnMUTE embeddings and the categorization code are being made available for the research.
Anthology ID:
W19-5011
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–104
Language:
URL:
https://aclanthology.org/W19-5011
DOI:
10.18653/v1/W19-5011
Bibkey:
Cite (ACL):
Hanna Pylieva, Artem Chernodub, Natalia Grabar, and Thierry Hamon. 2019. RNN Embeddings for Identifying Difficult to Understand Medical Words. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 97–104, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
RNN Embeddings for Identifying Difficult to Understand Medical Words (Pylieva et al., BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5011.pdf
Code
 hpylieva/FrnnMUTE