BioMedLAT Corpus: Annotation of the Lexical Answer Type for Biomedical Questions

Mariana Neves, Milena Kraus


Abstract
Question answering (QA) systems need to provide exact answers for the questions that are posed to the system. However, this can only be achieved through a precise processing of the question. During this procedure, one important step is the detection of the expected type of answer that the system should provide by extracting the headword of the questions and identifying its semantic type. We have annotated the headword and assigned UMLS semantic types to 643 factoid/list questions from the BioASQ training data. We present statistics on the corpus and a preliminary evaluation in baseline experiments. We also discuss the challenges on both the manual annotation and the automatic detection of the headwords and the semantic types. We believe that this is a valuable resource for both training and evaluation of biomedical QA systems. The corpus is available at: https://github.com/mariananeves/BioMedLAT.
Anthology ID:
W16-4407
Volume:
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Key-Sun Choi, Christina Unger, Piek Vossen, Jin-Dong Kim, Noriko Kando, Axel-Cyrille Ngonga Ngomo
Venue:
WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
49–58
Language:
URL:
https://aclanthology.org/W16-4407/
DOI:
Bibkey:
Cite (ACL):
Mariana Neves and Milena Kraus. 2016. BioMedLAT Corpus: Annotation of the Lexical Answer Type for Biomedical Questions. In Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016), pages 49–58, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
BioMedLAT Corpus: Annotation of the Lexical Answer Type for Biomedical Questions (Neves & Kraus, 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4407.pdf
Code
 mariananeves/BioMedLAT
Data
BioASQ