Resolving Elliptical Compounds in German Medical Text

Niklas Kammer, Florian Borchert, Silvia Winkler, Gerard de Melo, Matthieu-P. Schapranow


Abstract
Elliptical coordinated compound noun phrases (ECCNPs), a special kind of coordination ellipsis, are a common phenomenon in German medical texts. As their presence is known to affect the performance in downstream tasks such as entity extraction and disambiguation, their resolution can be a useful preprocessing step in information extraction pipelines. In this work, we present a new comprehensive dataset of more than 4,000 manually annotated ECCNPs in German medical text, along with the respective ground truth resolutions. Based on this data, we propose a generative encoder-decoder Transformer model, allowing for a simple end-to-end resolution of ECCNPs from raw input strings with very high accuracy (90.5% exact match score). We compare our approach to an elaborate rule-based baseline, which the generative model outperforms by a large margin. We further investigate different scenarios for prompting large language models (LLM) to resolve ECCNPs. In a zero-shot setting, performance is remarkably poor (21.6% exact matches), as the LLM tends to apply complex changes to the inputs unrelated to our specific task. We also find no improvement over the generative model when using the LLM for post-filtering of generated candidate resolutions.
Anthology ID:
2023.bionlp-1.26
Volume:
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
292–305
Language:
URL:
https://aclanthology.org/2023.bionlp-1.26
DOI:
10.18653/v1/2023.bionlp-1.26
Bibkey:
Cite (ACL):
Niklas Kammer, Florian Borchert, Silvia Winkler, Gerard de Melo, and Matthieu-P. Schapranow. 2023. Resolving Elliptical Compounds in German Medical Text. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 292–305, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Resolving Elliptical Compounds in German Medical Text (Kammer et al., BioNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bionlp-1.26.pdf