Domain adaptation in practice: Lessons from a real-world information extraction pipeline

Timothy Miller, Egoitz Laparra, Steven Bethard


Abstract
Advances in transfer learning and domain adaptation have raised hopes that once-challenging NLP tasks are ready to be put to use for sophisticated information extraction needs. In this work, we describe an effort to do just that – combining state-of-the-art neural methods for negation detection, document time relation extraction, and aspectual link prediction, with the eventual goal of extracting drug timelines from electronic health record text. We train on the THYME colon cancer corpus and test on both the THYME brain cancer corpus and an internal corpus, and show that performance of the combined systems is unacceptable despite good performance of individual systems. Although domain adaptation shows improvements on each individual system, the model selection problem is a barrier to improving overall pipeline performance.
Anthology ID:
2021.adaptnlp-1.11
Volume:
Proceedings of the Second Workshop on Domain Adaptation for NLP
Month:
April
Year:
2021
Address:
Kyiv, Ukraine
Editors:
Eyal Ben-David, Shay Cohen, Ryan McDonald, Barbara Plank, Roi Reichart, Guy Rotman, Yftah Ziser
Venue:
AdaptNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–110
Language:
URL:
https://aclanthology.org/2021.adaptnlp-1.11
DOI:
Bibkey:
Cite (ACL):
Timothy Miller, Egoitz Laparra, and Steven Bethard. 2021. Domain adaptation in practice: Lessons from a real-world information extraction pipeline. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 105–110, Kyiv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Domain adaptation in practice: Lessons from a real-world information extraction pipeline (Miller et al., AdaptNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.adaptnlp-1.11.pdf