A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction

Saadullah Amin, Katherine Ann Dunfield, Anna Vechkaeva, Guenter Neumann


Abstract
Fact triples are a common form of structured knowledge used within the biomedical domain. As the amount of unstructured scientific texts continues to grow, manual annotation of these texts for the task of relation extraction becomes increasingly expensive. Distant supervision offers a viable approach to combat this by quickly producing large amounts of labeled, but considerably noisy, data. We aim to reduce such noise by extending an entity-enriched relation classification BERT model to the problem of multiple instance learning, and defining a simple data encoding scheme that significantly reduces noise, reaching state-of-the-art performance for distantly-supervised biomedical relation extraction. Our approach further encodes knowledge about the direction of relation triples, allowing for increased focus on relation learning by reducing noise and alleviating the need for joint learning with knowledge graph completion.
Anthology ID:
2020.bionlp-1.20
Volume:
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Month:
July
Year:
2020
Address:
Online
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
187–194
Language:
URL:
https://aclanthology.org/2020.bionlp-1.20
DOI:
10.18653/v1/2020.bionlp-1.20
Bibkey:
Cite (ACL):
Saadullah Amin, Katherine Ann Dunfield, Anna Vechkaeva, and Guenter Neumann. 2020. A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 187–194, Online. Association for Computational Linguistics.
Cite (Informal):
A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction (Amin et al., BioNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.bionlp-1.20.pdf
Software:
 2020.bionlp-1.20.Software.zip
Video:
 http://slideslive.com/38929644
Code
 suamin/umls-medline-distant-re