Zach Jensen
2021
MS-Mentions: Consistently Annotating Entity Mentions in Materials Science Procedural Text
Tim O’Gorman
|
Zach Jensen
|
Sheshera Mysore
|
Kevin Huang
|
Rubayyat Mahbub
|
Elsa Olivetti
|
Andrew McCallum
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Material science synthesis procedures are a promising domain for scientific NLP, as proper modeling of these recipes could provide insight into new ways of creating materials. However, a fundamental challenge in building information extraction models for material science synthesis procedures is getting accurate labels for the materials, operations, and other entities of those procedures. We present a new corpus of entity mention annotations over 595 Material Science synthesis procedural texts (157,488 tokens), which greatly expands the training data available for the Named Entity Recognition task. We outline a new label inventory designed to provide consistent annotations and a new annotation approach intended to maximize the consistency and annotation speed of domain experts. Inter-annotator agreement studies and baseline models trained upon the data suggest that the corpus provides high-quality annotations of these mention types. This corpus helps lay a foundation for future high-quality modeling of synthesis procedures.
Search
Co-authors
- Tim O’Gorman 1
- Sheshera Mysore 1
- Kevin Huang 1
- Rubayyat Mahbub 1
- Elsa Olivetti 1
- show all...