Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language

Henning Schäfer, Ahmad Idrissi-Yaghir, Peter Horn, Christoph Friedrich


Abstract
In this work, cross-linguistic span prediction based on contextualized word embedding models is used together with neural machine translation (NMT) to transfer and apply the state-of-the-art models in natural language processing (NLP) to a low-resource language clinical corpus. Two directions are evaluated: (a) English models can be applied to translated texts to subsequently transfer the predicted annotations to the source language and (b) existing high-quality annotations can be transferred beyond translation and then used to train NLP models in the target language. Effectiveness and loss of transmission is evaluated using the German Berlin-Tübingen-Oncology Corpus (BRONCO) dataset with transferred external data from NCBI disease, SemEval-2013 drug-drug interaction (DDI) and i2b2/VA 2010 data. The use of English models for translated clinical texts has always involved attempts to take full advantage of the benefits associated with them (large pre-trained biomedical word embeddings). To improve advances in this area, we provide a general-purpose pipeline to transfer any annotated BRAT or CoNLL format to various target languages. For the entity class medication, good results were obtained with 0.806 F1-score after re-alignment. Limited success occurred in the diagnosis and treatment class with results just below 0.5 F1-score due to differences in annotation guidelines.
Anthology ID:
2022.clinicalnlp-1.6
Volume:
Proceedings of the 4th Clinical Natural Language Processing Workshop
Month:
July
Year:
2022
Address:
Seattle, WA
Editors:
Tristan Naumann, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–62
Language:
URL:
https://aclanthology.org/2022.clinicalnlp-1.6
DOI:
10.18653/v1/2022.clinicalnlp-1.6
Bibkey:
Cite (ACL):
Henning Schäfer, Ahmad Idrissi-Yaghir, Peter Horn, and Christoph Friedrich. 2022. Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language. In Proceedings of the 4th Clinical Natural Language Processing Workshop, pages 53–62, Seattle, WA. Association for Computational Linguistics.
Cite (Informal):
Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language (Schäfer et al., ClinicalNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.clinicalnlp-1.6.pdf
Video:
 https://aclanthology.org/2022.clinicalnlp-1.6.mp4
Code
 0xhesch/clat-cross-lingual-annotation-transfer
Data
2010 i2b2/VADDIMIMIC-IIINCBI Disease