Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study

Drahomira Herrmannova, Steven Young, Robert Patton, Christopher Stahl, Nicole Kleinstreuer, Mary Wolfe


Abstract
Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.
Anthology ID:
W18-5609
Volume:
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Alberto Lavelli, Anne-Lyse Minard, Fabio Rinaldi
Venue:
Louhi
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–82
Language:
URL:
https://aclanthology.org/W18-5609
DOI:
10.18653/v1/W18-5609
Bibkey:
Cite (ACL):
Drahomira Herrmannova, Steven Young, Robert Patton, Christopher Stahl, Nicole Kleinstreuer, and Mary Wolfe. 2018. Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study. In Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pages 71–82, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study (Herrmannova et al., Louhi 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5609.pdf