Towards Confidence Estimation for Typed Protein-Protein Relation Extraction

Camilo Thorne, Roman Klinger


Abstract
Systems which build on top of information extraction are typically challenged to extract knowledge that, while correct, is not yet well-known. We hypothesize that a good confidence measure for relational information has the property that such interesting information is found between information extracted with very high confidence and very low confidence. We discuss confidence estimation for the domain of biomedical protein-protein relation discovery in biomedical literature. As facts reported in papers take some time to be validated and recorded in biomedical databases, such task gives rise to large quantities of unknown but potentially true candidate relations. It is thus important to rank them based on supporting evidence rather than discard them. In this paper, we discuss this task and propose different approaches for confidence estimation and a pipeline to evaluate such methods. We show that the most straight-forward approach, a combination of different confidence measures from pipeline modules seems not to work well. We discuss this negative result and pinpoint potential future research directions.
Anthology ID:
W17-8008
Volume:
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Svetla Boytcheva, Kevin Bretonnel Cohen, Guergana Savova, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
55–63
Language:
URL:
https://doi.org/10.26615/978-954-452-044-1_008
DOI:
10.26615/978-954-452-044-1_008
Bibkey:
Cite (ACL):
Camilo Thorne and Roman Klinger. 2017. Towards Confidence Estimation for Typed Protein-Protein Relation Extraction. In Proceedings of the Biomedical NLP Workshop associated with RANLP 2017, pages 55–63, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Towards Confidence Estimation for Typed Protein-Protein Relation Extraction (Thorne & Klinger, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-044-1_008