Query selection methods for automated corpora construction with a use case in food-drug interactions

Georgeta Bordea, Tsanta Randriatsitohaina, Fleur Mougin, Natalia Grabar, Thierry Hamon


Abstract
In this paper, we address the problem of automatically constructing a relevant corpus of scientific articles about food-drug interactions. There is a growing number of scientific publications that describe food-drug interactions but currently building a high-coverage corpus that can be used for information extraction purposes is not trivial. We investigate several methods for automating the query selection process using an expert-curated corpus of food-drug interactions. Our experiments show that index term features along with a decision tree classifier are the best approach for this task and that feature selection approaches and in particular gain ratio outperform frequency-based methods for query selection.
Anthology ID:
W19-5013
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–124
Language:
URL:
https://aclanthology.org/W19-5013/
DOI:
10.18653/v1/W19-5013
Bibkey:
Cite (ACL):
Georgeta Bordea, Tsanta Randriatsitohaina, Fleur Mougin, Natalia Grabar, and Thierry Hamon. 2019. Query selection methods for automated corpora construction with a use case in food-drug interactions. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 115–124, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Query selection methods for automated corpora construction with a use case in food-drug interactions (Bordea et al., BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5013.pdf