Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts

Antje Schlaf, Claudia Bobach, Matthias Irmer


Abstract
This paper describes the creation of a gold standard for chemistry-disease relations in patent texts. We start with an automated annotation of named entities of the domains chemistry (e.g. “propranolol”) and diseases (e.g. “hypertension”) as well as of related domains like methods and substances. After that, domain-relevant relations between these entities, e.g. “propranolol treats hypertension”, have been manually annotated. The corpus is intended to be suitable for developing and evaluating relation extraction methods. In addition, we present two reasoning methods of high precision for automatically extending the set of extracted relations. Chain reasoning provides a method to infer and integrate additional, indirectly expressed relations occurring in relation chains. Enumeration reasoning exploits the frequent occurrence of enumerations in patents and automatically derives additional relations. These two methods are applicable both for verifying and extending the manually annotated data as well as for potential improvements of automatic relation extraction.
Anthology ID:
L14-1444
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2057–2061
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/536_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Antje Schlaf, Claudia Bobach, and Matthias Irmer. 2014. Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2057–2061, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts (Schlaf et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/536_Paper.pdf