Learning what to read: Focused machine reading

Enrique Noriega-Atala, Marco A. Valenzuela-Escárcega, Clayton Morrison, Mihai Surdeanu


Abstract
Recent efforts in bioinformatics have achieved tremendous progress in the machine reading of biomedical literature, and the assembly of the extracted biochemical interactions into large-scale models such as protein signaling pathways. However, batch machine reading of literature at today’s scale (PubMed alone indexes over 1 million papers per year) is unfeasible due to both cost and processing overhead. In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible. We introduce a family of algorithms for focused reading, including an intuitive, strong baseline, and a second approach which uses a reinforcement learning (RL) framework that learns when to explore (widen the search) or exploit (narrow it). We demonstrate that the RL approach is capable of answering more queries than the baseline, while being more efficient, i.e., reading fewer documents.
Anthology ID:
D17-1313
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2905–2910
Language:
URL:
https://aclanthology.org/D17-1313/
DOI:
10.18653/v1/D17-1313
Bibkey:
Cite (ACL):
Enrique Noriega-Atala, Marco A. Valenzuela-Escárcega, Clayton Morrison, and Mihai Surdeanu. 2017. Learning what to read: Focused machine reading. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2905–2910, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Learning what to read: Focused machine reading (Noriega-Atala et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1313.pdf