Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering
This paper presents the results of the sixth edition of the BioASQ challenge. The BioASQ challenge aims at the promotion of systems and methodologies through the organization of a challenge on two tasks: semantic indexing and question answering. In total, 26 teams with more than 90 systems participated in this year’s challenge. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.
Question answering (QA) systems usually rely on advanced natural language processing components to precisely understand the questions and extract the answers. Semantic role labeling (SRL) is known to boost performance for QA, but its use for biomedical texts has not yet been fully studied. We analyzed the performance of three SRL tools (BioKIT, BIOSMILE and PathLSTM) on 1776 questions from the BioASQ challenge. We compared the systems regarding the coverage of the questions and snippets, as well as based on pre-defined criteria, such as easiness of installation, supported formats and usability. Finally, we integrated two of the tools in a simple QA system to further evaluate their performance over the official BioASQ test sets.
This paper describes Macquarie University’s contribution to the BioASQ Challenge (BioASQ 6b, Phase B). We focused on the extraction of the ideal answers, and the task was approached as an instance of query-based multi-document summarisation. In particular, this paper focuses on the experiments related to the deep learning and reinforcement learning approaches used in the submitted runs. The best run used a deep learning model under a regression-based framework. The deep learning architecture used features derived from the output of LSTM chains on word embeddings, plus features based on similarity with the query, and sentence position. The reinforcement learning approach was a proof-of-concept prototype that trained a global policy using REINFORCE. The global policy was implemented as a neural network that used tf.idf features encoding the candidate sentence, question, and context.
We present AUEB’s submissions to the BioASQ 6 document and snippet retrieval tasks (parts of Task 6b, Phase A). Our models use novel extensions to deep learning architectures that operate solely over the text of the query and candidate document/snippets. Our systems scored at the top or near the top for all batches of the challenge, highlighting the effectiveness of deep learning for these tasks.
Biomedical Question Answering is concerned with the development of methods and systems that automatically find answers to natural language posed questions. In this work, we describe the system used in the BioASQ Challenge task 6b for document retrieval and snippet retrieval (with particular emphasis in this subtask). The proposed model makes use of semantic similarity patterns that are evaluated and measured by a convolutional neural network architecture. Subsequently, the snippet ranking performance is improved with a pseudo-relevance feedback approach in a later step. Based on the preliminary results, we reached the second position in snippet retrieval sub-task.
There are millions of articles in PubMed database. To facilitate information retrieval, curators in the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH is a hierarchically-organized vocabulary, containing about 28K different concepts, covering the fields from clinical medicine to information sciences. Several automatic MeSH indexing models have been developed to improve the time-consuming and financially expensive manual annotation, including the NLM official tool – Medical Text Indexer, and the winner of BioASQ Task5a challenge – DeepMeSH. However, these models are complex and not interpretable. We propose a novel end-to-end model, AttentionMeSH, which utilizes deep learning and attention mechanism to index MeSH terms to biomedical text. The attention mechanism enables the model to associate textual evidence with annotations, thus providing interpretability at the word level. The model also uses a novel masking mechanism to enhance accuracy and speed. In the final week of BioASQ Chanllenge Task6a, we ranked 2nd by average MiF using an on-construction model. After the contest, we achieve close to state-of-the-art MiF performance of ∼ 0.684 using our final model. Human evaluations show AttentionMeSH also provides high level of interpretability, retrieving about 90% of all expert-labeled relevant words given an MeSH-article pair at 20 output.
The growing number of biomedical publications is a challenge for human researchers, who invest considerable effort to search for relevant documents and pinpointed answers. Biomedical Question Answering can automatically generate answers for a user’s topic or question, significantly reducing the effort required to locate the most relevant information in a large document corpus. Extractive summarization techniques, which concatenate the most relevant text units drawn from multiple documents, perform well on automatic evaluation metrics like ROUGE, but score poorly on human readability, due to the presence of redundant text and grammatical errors in the answer. This work moves toward abstractive summarization, which attempts to distill and present the meaning of the original text in a more coherent way. We incorporate a sentence fusion approach, based on Integer Linear Programming, along with three novel approaches for sentence ordering, in an attempt to improve the human readability of ideal answers. Using an open framework for configuration space exploration (BOOM), we tested over 2000 unique system configurations in order to identify the best-performing combinations for the sixth edition of Phase B of the BioASQ challenge.
In this paper, we detail our submission to the BioASQ competition’s Biomedical Semantic Question and Answering task. Our system uses extractive summarization techniques to generate answers and has scored highest ROUGE-2 and Rogue-SU4 in all test batch sets. Our contributions are named-entity based method for answering factoid and list questions, and an extractive summarization techniques for building paragraph-sized summaries, based on lexical chains. Our system got highest ROUGE-2 and ROUGE-SU4 scores for ideal-type answers in all test batch sets. We also discuss the limitations of the described system, such lack of the evaluation on other criteria (e.g. manual). Also, for factoid- and list -type question our system got low accuracy (which suggests that our algorithm needs to improve in the ranking of entities).
BIOASQ Task B Phase B challenge focuses on extracting answers from snippets for a given question. The dataset provided by the organizers contains answers, but not all their variants. Henceforth a manual annotation was performed to extract all forms of correct answers. This article shows the impact of using all occurrences of correct answers for training on the evaluation scores which are improved significantly.
Ontology-Based Retrieval & Neural Approaches for BioASQ Ideal Answer Generation
Ashwin Naresh Kumar | Harini Kesavamoorthy | Madhura Das | Pramati Kalwad | Khyathi Chandu | Teruko Mitamura | Eric Nyberg
The ever-increasing magnitude of biomedical information sources makes it difficult and time-consuming for a human researcher to find the most relevant documents and pinpointed answers for a specific question or topic when using only a traditional search engine. Biomedical Question Answering systems automatically identify the most relevant documents and pinpointed answers, given an information need expressed as a natural language question. Generating a non-redundant, human-readable summary that satisfies the information need of a given biomedical question is the focus of the Ideal Answer Generation task, part of the BioASQ challenge. This paper presents a system for ideal answer generation (using ontology-based retrieval and a neural learning-to-rank approach, combined with extractive and abstractive summarization techniques) which achieved the highest ROUGE score of 0.659 on the BioASQ 5b batch 2 test.