<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="2300">
    <title>BioNLP 2017</title>
    <editor>Kevin Bretonnel Cohen</editor>
    <editor>Dina Demner-Fushman</editor>
    <editor>Sophia Ananiadou</editor>
    <editor>Junichi Tsujii</editor>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-23</url>
    <bibtype>book</bibtype>
    <bibkey>BioNLP17:2017</bibkey>
  </paper>

  <paper id="2301">
    <title>Target word prediction and paraphasia classification in spoken discourse</title>
    <author><first>Joel</first><last>Adams</last></author>
    <author><first>Steven</first><last>Bedrick</last></author>
    <author><first>Gerasimos</first><last>Fergadiotis</last></author>
    <author><first>Kyle</first><last>Gorman</last></author>
    <author><first>Jan</first><last>van Santen</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;8</pages>
    <url>http://www.aclweb.org/anthology/W17-2301</url>
    <abstract>We present a system for automatically detecting and classifying phonologically
	anomalous productions in the speech of individuals with aphasia.
	Working from transcribed discourse samples, our system identifies neologisms,
	and uses a combination of string alignment and language models to produce a
	lattice of plausible words that the speaker may have intended to produce.
	We then score this lattice according to various features, and attempt to
	determine whether the anomalous production represented a phonemic error or a
	genuine neologism.
	This approach has the potential to be expanded to consider other types of
	paraphasic errors, and could be applied to a wide variety of screening and
	therapeutic applications.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>adams-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2302">
    <title>Extracting Drug-Drug Interactions with Attention CNNs</title>
    <author><first>Masaki</first><last>Asada</last></author>
    <author><first>Makoto</first><last>Miwa</last></author>
    <author><first>Yutaka</first><last>Sasaki</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>9&#8211;18</pages>
    <url>http://www.aclweb.org/anthology/W17-2302</url>
    <abstract>We propose a novel attention mechanism for a Convolutional Neural Network
	(CNN)-based Drug-Drug Interaction (DDI) extraction model. CNNs have been shown
	to have a great potential on DDI extraction tasks; however, attention
	mechanisms, which emphasize important words in the sentence of a target-entity
	pair, have not been investigated with the CNNs despite the fact that attention
	mechanisms are shown to be effective for a general domain relation
	classification task. We evaluated our model on the Task 9.2 of the
	DDIExtraction-2013 shared task. As a result, our attention mechanism improved
	the performance of our base CNN-based DDI model, and the model achieved an
	F-score of 69.12%, which is competitive with the state-of-the-art models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>asada-miwa-sasaki:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2303">
    <title>Insights into Analogy Completion from the Biomedical Domain</title>
    <author><first>Denis</first><last>Newman-Griffis</last></author>
    <author><first>Albert</first><last>Lai</last></author>
    <author><first>Eric</first><last>Fosler-Lussier</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>19&#8211;28</pages>
    <url>http://www.aclweb.org/anthology/W17-2303</url>
    <abstract>Analogy completion has been a popular task in recent years for evaluating the
	semantic properties of word embeddings, but the standard methodology makes a
	number of assumptions about analogies that do not always hold, either in recent
	benchmark datasets or when expanding into other domains.  Through an analysis
	of analogies in the biomedical domain, we identify three assumptions: that of a
	Single Answer for any given analogy, that the pairs involved describe the Same
	Relationship, and that each pair is Informative with respect to the other. We
	propose modifying the standard methodology to relax these assumptions by
	allowing for multiple correct answers, reporting MAP and MRR in addition to
	accuracy, and using multiple example pairs.  We further present BMASS, a novel
	dataset for evaluating linguistic regularities in biomedical embeddings, and
	demonstrate that the relationships described in the dataset pose significant
	semantic challenges to current word embedding methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>newmangriffis-lai-foslerlussier:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2304">
    <title>Deep learning for extracting protein-protein interactions from biomedical literature</title>
    <author><first>Yifan</first><last>Peng</last></author>
    <author><first>Zhiyong</first><last>Lu</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>29&#8211;38</pages>
    <url>http://www.aclweb.org/anthology/W17-2304</url>
    <abstract>State-of-the-art methods for protein-protein interaction (PPI) extraction are
	primarily feature-based or kernel-based by leveraging lexical and syntactic
	information. But how to incorporate such knowledge in the recent deep learning
	methods remains an open question. In this paper, we propose a
	multichannel dependency-based convolutional neural network model (McDepCNN). It
	applies one channel to the embedding vector of each word in the sentence, and
	another channel to the embedding vector of the head of the corresponding word.
	Therefore, the model can use richer information obtained from different
	channels. Experiments on two public benchmarking datasets, AIMed and BioInfer,
	demonstrate that McDepCNN provides up to 6% F1-score improvement over rich
	feature-based methods and single-kernel methods. In addition, McDepCNN achieves
	24.4% relative improvement in F1-score over the state-of-the-art methods on
	cross-corpus evaluation and 12% improvement in F1-score over kernel-based
	methods on "difficult" instances. These results suggest that McDepCNN
	generalizes more easily over different corpora, and is capable of capturing
	long distance features in the sentences.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>peng-lu:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2305">
    <title>Stacking With Auxiliary Features for Entity Linking in the Medical Domain</title>
    <author><first>Nazneen Fatema</first><last>Rajani</last></author>
    <author><first>Mihaela</first><last>Bornea</last></author>
    <author><first>Ken</first><last>Barker</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>39&#8211;47</pages>
    <url>http://www.aclweb.org/anthology/W17-2305</url>
    <abstract>Linking spans of natural language text to concepts in a structured source is an
	important task for many problems. It allows intelligent systems to leverage
	rich knowledge available in those sources (such as concept properties and
	relations) to enhance the semantics of the mentions of these concepts in text.
	In the medical domain, it is common to link text spans to medical concepts in
	large, curated knowledge repositories such as the Unified Medical Language
	System.
	Different approaches have different strengths: some are precision-oriented,
	some recall-oriented; some better at considering context but more prone to
	hallucination. The variety of techniques suggests that ensembling could
	outperform component technologies at this task.
	In this paper, we describe our process for building a Stacking ensemble using
	additional, auxiliary features for Entity Linking in the medical domain. We
	report experiments that show that naive ensembling does not always outperform
	component Entity Linking systems, that stacking usually outperforms naive
	ensembling, and that auxiliary features added to the stacker further improve
	its performance on three distinct datasets. Our best model produces
	state-of-the-art results on several medical datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rajani-bornea-barker:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2306">
    <title>Results of the fifth edition of the BioASQ Challenge</title>
    <author><first>Anastasios</first><last>Nentidis</last></author>
    <author><first>Konstantinos</first><last>Bougiatiotis</last></author>
    <author><first>Anastasia</first><last>Krithara</last></author>
    <author><first>Georgios</first><last>Paliouras</last></author>
    <author><first>Ioannis</first><last>Kakadiaris</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>48&#8211;57</pages>
    <url>http://www.aclweb.org/anthology/W17-2306</url>
    <attachment type="presentation">W17-2306.Presentation.pdf</attachment>
    <abstract>The goal of the BioASQ challenge is to engage researchers into creating
	cuttingedge biomedical information systems. Specifically, it aims at the
	promotion of systems and methodologies that are able to deal with a plethora of
	different tasks in the biomedical domain. This is achieved through the
	organization of challenges. The fifth challenge consisted of three tasks:
	semantic indexing, question answering and a new task on information extraction.
	In total, 29 teams with more than 95 systems participated in the challenge.
	Overall, as in previous years, the best systems were able to outperform the
	strong baselines. This suggests that stateof- the art systems are continuously
	improving, pushing the frontier of research.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nentidis-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2307">
    <title>Tackling Biomedical Text Summarization: OAQA at BioASQ 5B</title>
    <author><first>Khyathi</first><last>Chandu</last></author>
    <author><first>Aakanksha</first><last>Naik</last></author>
    <author><first>Aditya</first><last>Chandrasekar</last></author>
    <author><first>Zi</first><last>Yang</last></author>
    <author><first>Niloy</first><last>Gupta</last></author>
    <author><first>Eric</first><last>Nyberg</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>58&#8211;66</pages>
    <url>http://www.aclweb.org/anthology/W17-2307</url>
    <abstract>In this paper, we describe our participation in phase B of task 5b of the fifth
	edition of the annual BioASQ challenge, which includes answering factoid, list,
	yes-no and summary questions from biomedical data. We describe our techniques
	with an emphasis on ideal answer generation, where the goal is to produce a
	relevant, precise, non-redundant, query-oriented summary from multiple relevant
	documents. We make use of extractive summarization techniques to address this
	task and experiment with different biomedical ontologies and various algorithms
	including agglomerative clustering, Maximum Marginal Relevance (MMR) and
	sentence compression. We propose a novel word embedding based tf-idf similarity
	metric and a soft positional constraint which improve our system performance.
	We evaluate our techniques on test batch 4 from the fourth edition of the
	challenge. Our best system achieves a ROUGE-2 score of 0.6534 and ROUGE-SU4
	score of 0.6536.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chandu-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2308">
    <title>Macquarie University at BioASQ 5b &#8211; Query-based Summarisation Techniques for Selecting the Ideal Answers</title>
    <author><first>Diego</first><last>Molla</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>67&#8211;75</pages>
    <url>http://www.aclweb.org/anthology/W17-2308</url>
    <abstract>Macquarie University's contribution to the BioASQ challenge (Task 5b Phase B)
	focused on the use of query-based extractive summarisation techniques for the
	generation of the ideal answers. Four runs were submitted, with approaches
	ranging from a trivial system that selected the first $n$ snippets, to the use
	of deep learning approaches under a regression framework. Our experiments and
	the ROUGE results of the five test batches of BioASQ indicate surprisingly good
	results for the trivial approach. Overall, most of our runs on the first three
	test batches achieved the best ROUGE-SU4 results in the challenge.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>molla:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2309">
    <title>Neural Question Answering at BioASQ 5B</title>
    <author><first>Georg</first><last>Wiese</last></author>
    <author><first>Dirk</first><last>Weissenborn</last></author>
    <author><first>Mariana</first><last>Neves</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>76&#8211;79</pages>
    <url>http://www.aclweb.org/anthology/W17-2309</url>
    <attachment type="presentation">W17-2309.Presentation.pdf</attachment>
    <abstract>This paper describes our submission to the 2017 BioASQ challenge. We
	participated in Task B, Phase B which is concerned with biomedical question
	answering (QA). We focus on factoid and list question, using an extractive QA
	model, that is, we restrict our system to output  substrings of the provided
	text snippets. At the core of our system, we use FastQA, a state-of-the-art
	neural QA system. We extended it with biomedical word embeddings and changed
	its answer layer to be able to answer list questions in addition to factoid
	questions. We pre-trained the model on a large-scale open-domain QA dataset,
	SQuAD, and then fine-tuned the parameters on the BioASQ training set. With our
	approach, we achieve state-of-the-art results on factoid questions and
	competitive results on list questions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wiese-weissenborn-neves:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2310">
    <title>End-to-End System for Bacteria Habitat Extraction</title>
    <author><first>Farrokh</first><last>Mehryary</last></author>
    <author><first>Kai</first><last>Hakala</last></author>
    <author><first>Suwisa</first><last>Kaewphan</last></author>
    <author><first>Jari</first><last>Bj&#246;rne</last></author>
    <author><first>Tapio</first><last>Salakoski</last></author>
    <author><first>Filip</first><last>Ginter</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>80&#8211;90</pages>
    <url>http://www.aclweb.org/anthology/W17-2310</url>
    <abstract>We introduce an end-to-end system capable of named-entity detection,
	normalization and relation extraction for extracting information about bacteria
	and their habitats from biomedical literature. Our system is based on deep
	learning, CRF classifiers and vector space models. We train and evaluate the
	system on the BioNLP 2016 Shared Task Bacteria Biotope data. The official
	evaluation shows that the joint performance of our entity detection and
	relation extraction models outperforms the winning team of the Shared Task by
	19pp on F1-score, establishing a new top score for the task. We also achieve
	state-of-the-art results in the normalization task. Our system is open source
	and freely available at https://github.com/TurkuNLP/BHE.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mehryary-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2311">
    <title>Creation and evaluation of a dictionary-based tagger for virus species and proteins</title>
    <author><first>Helen</first><last>Cook</last></author>
    <author><first>Rudolfs</first><last>Berzins</last></author>
    <author><first>Cristina Leal</first><last>Rodrıguez</last></author>
    <author><first>Juan Miguel</first><last>Cejuela</last></author>
    <author><first>Lars Juhl</first><last>Jensen</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>91&#8211;98</pages>
    <url>http://www.aclweb.org/anthology/W17-2311</url>
    <abstract>ext mining automatically extracts information from the literature with the goal
	of making it available for further analysis, for example by incorporating it
	into biomedical databases.  A key first step towards this goal is to identify
	and normalize the named entities, such as proteins and species, which are
	mentioned in text.  Despite the large detrimental impact that viruses have on
	human and agricultural health, very little previous text-mining work has
	focused on identifying virus species and proteins in the literature.  Here, we
	present an improved dictionary-based system for viral species and the first
	dictionary for viral proteins, which we benchmark on a new corpus of 300
	manually annotated abstracts.  We achieve 81.0% precision and 72.7% recall at
	the task of recognizing and normalizing viral species and 76.2% precision and
	34.9% recall on viral proteins.  These results are achieved despite the many
	challenges involved with the names of viral species and, especially, proteins. 
	This work provides a foundation that can be used to extract more complicated
	relations about viruses from the literature.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>cook-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2312">
    <title>Representation of complex terms in a vector space structured by an ontology for a normalization task</title>
    <author><first>Arnaud</first><last>Ferr&#233;</last></author>
    <author><first>Pierre</first><last>Zweigenbaum</last></author>
    <author><first>Claire</first><last>N&#233;dellec</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>99&#8211;106</pages>
    <url>http://www.aclweb.org/anthology/W17-2312</url>
    <abstract>We propose in this paper a semi-supervised method for labeling terms of texts
	with concepts of a domain ontology. The method generates continuous vector
	representations of complex terms in a semantic space structured by the
	ontology. The proposed method relies on a distributional semantics approach,
	which generates initial vectors for each of the extracted terms. Then these
	vectors are embedded in the vector space constructed from the structure of the
	ontology. This embedding is carried out by training a linear model. Finally, we
	apply a distance calculation to determine the proximity between vectors of
	terms and vectors of concepts and thus to assign ontology labels to terms. We
	have evaluated the quality of these representations for a normalization task by
	using the concepts of an ontology as semantic labels. Normalization of terms is
	an important step to extract a part of the information containing in texts, but
	the vector space generated might find other applications. The performance of
	this method is comparable to that of the state of the art for this task of
	standardization, opening up encouraging prospects.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ferre-zweigenbaum-nedellec:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2313">
    <title>Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second&#8211;Order Vectors</title>
    <author><first>Bridget</first><last>McInnes</last></author>
    <author><first>Ted</first><last>Pedersen</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>107&#8211;116</pages>
    <url>http://www.aclweb.org/anthology/W17-2313</url>
    <abstract>Vector space methods that measure semantic similarity and relatedness often
	rely on distributional information such as co&#8211;occurrence frequencies or
	statistical measures of association to weight the importance of particular
	co&#8211;occurrences. In this paper, we extend these methods by incorporating a
	measure of semantic similarity based on a human curated taxonomy into a
	second&#8211;order vector representation. This results in a measure of semantic
	relatedness that combines both the contextual  information available in a
	corpus&#8211;based vector space representation with the semantic knowledge found in
	a biomedical ontology. Our results show that incorporating semantic similarity
	into a second order co-occurrence matrices improves correlation with human
	judgments for both similarity and relatedness, and that our method compares
	favorably to various different word embedding methods that have recently been
	evaluated on the same reference standards we have used.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mcinnes-pedersen:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2314">
    <title>Proactive Learning for Named Entity Recognition</title>
    <author><first>Maolin</first><last>Li</last></author>
    <author><first>Nhung</first><last>Nguyen</last></author>
    <author><first>Sophia</first><last>Ananiadou</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>117&#8211;125</pages>
    <url>http://www.aclweb.org/anthology/W17-2314</url>
    <abstract>The goal of active learning is to minimise the cost of producing an annotated
	dataset, in which annotators are assumed to be perfect, i.e., they always
	choose the correct labels. However, in practice, annotators are not infallible,
	and they are likely to assign incorrect labels to some instances. Proactive
	learning is a generalisation of active learning that can model different kinds
	of annotators. Although proactive learning has been applied to certain
	labelling tasks, such as text classification, there is little work on its
	application to named entity (NE) tagging. In this paper, we propose a proactive
	learning method for producing NE annotated corpora, using two annotators with
	different levels of expertise, and who charge different amounts based on their
	levels of experience. To optimise both cost and annotation quality, we also
	propose a mechanism to present multiple sentences to annotators at each
	iteration. Experimental results for several corpora show that our method
	facilitates the construction of high-quality NE labelled datasets at minimal
	cost.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>li-nguyen-ananiadou:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2315">
    <title>Biomedical Event Extraction using Abstract Meaning Representation</title>
    <author><first>Sudha</first><last>Rao</last></author>
    <author><first>Daniel</first><last>Marcu</last></author>
    <author><first>Kevin</first><last>Knight</last></author>
    <author><first>Hal</first><last>Daum&#233; III</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>126&#8211;135</pages>
    <url>http://www.aclweb.org/anthology/W17-2315</url>
    <abstract>We propose a novel, Abstract Meaning Representation (AMR) based approach to
	identifying molecular events/interactions in biomedical text. Our key
	contributions are: (1) an empirical validation of our hypothesis that an event
	is a subgraph of the AMR graph, (2) a neural network-based model that
	identifies such an event subgraph given an AMR, and (3) a distant supervision
	based approach to gather additional training data. We evaluate our approach on
	the 2013 Genia Event Extraction dataset and show promising results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rao-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2316">
    <title>Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System</title>
    <author><first>Ari</first><last>Klein</last></author>
    <author><first>Abeed</first><last>Sarker</last></author>
    <author><first>Masoud</first><last>Rouhizadeh</last></author>
    <author><first>Karen</first><last>O'Connor</last></author>
    <author><first>Graciela</first><last>Gonzalez</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>136&#8211;142</pages>
    <url>http://www.aclweb.org/anthology/W17-2316</url>
    <abstract>Social media sites (e.g., Twitter) have been used for surveillance of drug
	safety at the population level, but studies that focus on the effects of
	medications on specific sets of individuals have had to rely on other sources
	of data. Mining social media data for this in-formation would require the
	ability to distinguish indications of personal medication in-take in this
	media. Towards that end, this paper presents an annotated corpus that can be
	used to train machine learning systems to determine whether a tweet that
	mentions a medication indicates that the individual posting has taken that
	medication at a specific time. To demonstrate the utility of the corpus as a
	training set, we present baseline results of supervised classification.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>klein-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2317">
    <title>Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings</title>
    <author><first>Pieter</first><last>Fivez</last></author>
    <author><first>Simon</first><last>Suster</last></author>
    <author><first>Walter</first><last>Daelemans</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>143&#8211;148</pages>
    <url>http://www.aclweb.org/anthology/W17-2317</url>
    <abstract>We present an unsupervised context-sensitive spelling correction method for
	clinical free-text 
	that uses word and character n-gram embeddings. Our method generates
	misspelling replacement candidates and ranks them 
	according to their semantic fit, by calculating a weighted cosine similarity
	between the vectorized representation of a candidate
	and the misspelling context. We greatly outperform two baseline off-the-shelf
	spelling correction tools on a manually annotated MIMIC-III test set,
	and counter the frequency bias of an optimized noisy channel model,
	showing that neural embeddings can be successfully exploited to include
	context-awareness in a spelling correction model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>fivez-suster-daelemans:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2318">
    <title>Characterization of Divergence in Impaired Speech of ALS Patients</title>
    <author><first>Archna</first><last>Bhatia</last></author>
    <author><first>Bonnie</first><last>Dorr</last></author>
    <author><first>Kristy</first><last>Hollingshead</last></author>
    <author><first>Samuel L.</first><last>Phillips</last></author>
    <author><first>Barbara</first><last>McKenzie</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>149&#8211;158</pages>
    <url>http://www.aclweb.org/anthology/W17-2318</url>
    <abstract>Approximately 80% to 95% of patients with Amyotrophic Lateral Sclerosis (ALS)
	eventually develop speech impairments, such as defective articulation, slow
	laborious speech and hypernasality. The relationship between impaired speech
	and asymptomatic speech may be seen as a divergence from a baseline. This
	relationship can be characterized in terms of measurable combinations of
	phonological characteristics that are indicative of the degree to which the two
	diverge. We demonstrate that divergence measurements based on phonological
	characteristics of speech correlate with physiological assessments of ALS.
	Speech-based assessments offer benefits over commonly-used physiological
	assessments in that they are inexpensive, non-intrusive, and do not require
	trained clinical personnel for administering and interpreting the results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bhatia-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2319">
    <title>Deep Learning for Punctuation Restoration in Medical Reports</title>
    <author><first>Wael</first><last>Salloum</last></author>
    <author><first>Greg</first><last>Finley</last></author>
    <author><first>Erik</first><last>Edwards</last></author>
    <author><first>Mark</first><last>Miller</last></author>
    <author><first>David</first><last>Suendermann-Oeft</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>159&#8211;164</pages>
    <url>http://www.aclweb.org/anthology/W17-2319</url>
    <abstract>In clinical dictation, speakers try to be as concise as possible to save time,
	often resulting in utterances without explicit punctuation commands.  Since the
	end product of a dictated report, e.g. an out-patient letter, does require
	correct orthography, including exact punctuation, the latter need to be
	restored, preferably by automated means.  This paper describes a method for
	punctuation restoration based on a state-of-the-art stack of NLP and machine
	learning techniques including B-RNNs with an attention mechanism and late
	fusion, as well as a feature extraction technique tailored to the processing of
	medical terminology using a novel vocabulary reduction model.  To the best of
	our knowledge, the resulting performance is superior to that reported in prior
	art on similar tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>salloum-EtAl:2017:BioNLP171</bibkey>
  </paper>

  <paper id="2320">
    <title>Unsupervised Domain Adaptation for Clinical Negation Detection</title>
    <author><first>Timothy</first><last>Miller</last></author>
    <author><first>Steven</first><last>Bethard</last></author>
    <author><first>Hadi</first><last>Amiri</last></author>
    <author><first>Guergana</first><last>Savova</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>165&#8211;170</pages>
    <url>http://www.aclweb.org/anthology/W17-2320</url>
    <abstract>Detecting negated concepts in clinical texts is an important part of NLP
	information extraction systems. However, generalizability of negation systems
	is lacking, as cross-domain experiments suffer dramatic performance losses. We
	examine the performance of multiple unsupervised domain adaptation algorithms
	on clinical negation detection, finding only modest gains that fall well short
	of in-domain performance.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>miller-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2321">
    <title>BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations</title>
    <author><first>Rezarta</first><last>Islamaj Dogan</last></author>
    <author><first>Andrew</first><last>Chatr-aryamontri</last></author>
    <author><first>Sun</first><last>Kim</last></author>
    <author><first>Chih-Hsuan</first><last>Wei</last></author>
    <author><first>Yifan</first><last>Peng</last></author>
    <author><first>Donald</first><last>Comeau</last></author>
    <author><first>Zhiyong</first><last>Lu</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>171&#8211;175</pages>
    <url>http://www.aclweb.org/anthology/W17-2321</url>
    <abstract>The Precision Medicine Track in BioCre-ative VI aims to bring together the
	Bi-oNLP community for a novel challenge focused on mining the biomedical
	litera-ture in search of mutations and protein-protein interactions (PPI). In
	order to support this track with an effective train-ing dataset with limited
	curator time, the track organizers carefully reviewed Pub-Med articles from two
	different sources: curated public PPI databases, and the re-sults of
	state-of-the-art public text mining tools. We detail here the data collection,
	manual review and annotation process and describe this training corpus
	charac-teristics. We also describe a corpus per-formance baseline. This
	analysis will provide useful information to developers and researchers for
	comparing and devel-oping innovative text mining approaches for the BioCreative
	VI challenge and other Precision Medicine related applica-tions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>islamajdogan-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2322">
    <title>Painless Relation Extraction with Kindred</title>
    <author><first>Jake</first><last>Lever</last></author>
    <author><first>Steven</first><last>Jones</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>176&#8211;183</pages>
    <url>http://www.aclweb.org/anthology/W17-2322</url>
    <abstract>Relation extraction methods are essential for creating robust text mining tools
	to help researchers find useful knowledge in the vast published literature.
	Easy-to-use and generalizable methods are needed to encourage an ecosystem in
	which researchers can easily use shared resources and build upon each others'
	methods. We present the Kindred Python package for relation extraction. It
	builds upon methods from the most successful tools in the recent BioNLP Shared
	Task to predict high-quality predictions with low computational cost. It also
	integrates with PubAnnotation, PubTator, and BioNLP Shared Task data in order
	to allow easy development and application of relation extraction models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lever-jones:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2323">
    <title>Noise Reduction Methods for Distantly Supervised Biomedical Relation Extraction</title>
    <author><first>Gang</first><last>Li</last></author>
    <author><first>Cathy</first><last>Wu</last></author>
    <author><first>K.</first><last>Vijay-Shanker</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>184&#8211;193</pages>
    <url>http://www.aclweb.org/anthology/W17-2323</url>
    <abstract>Distant supervision has been applied to automatically generate labeled data for
	biomedical relation extraction. Noise exists in both positively and
	negatively-labeled data and affects the performance of supervised machine
	learning methods. In this paper, we propose three novel heuristics based on the
	notion of proximity, trigger word and confidence of patterns to leverage
	lexical and syntactic information to reduce the level of noise in the distantly
	labeled data. Experiments on three different tasks, extraction of
	protein-protein-interaction, miRNA-gene regulation relation and
	protein-localization event, show that the proposed methods can improve the
	F-score over the baseline by 6, 10 and 14 points for the three tasks,
	respectively. We also show that when the models are configured to output
	high-confidence results, high precisions can be obtained using the proposed
	methods, making them promising for facilitating manual curation for databases.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>li-wu-vijayshanker:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2324">
    <title>Role-Preserving Redaction of Medical Records to Enable Ontology-Driven Processing</title>
    <author><first>Seth</first><last>Polsley</last></author>
    <author><first>Atif</first><last>Tahir</last></author>
    <author><first>Muppala</first><last>Raju</last></author>
    <author><first>Akintayo</first><last>Akinleye</last></author>
    <author><first>Duane</first><last>Steward</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>194&#8211;199</pages>
    <url>http://www.aclweb.org/anthology/W17-2324</url>
    <abstract>Electronic medical records (EMR) have largely replaced hand-written patient
	files in healthcare. The growing pool of EMR data presents a significant
	resource in medical research, but the U.S. Health Insurance Portability and
	Accountability Act (HIPAA) mandates redacting medical records before performing
	any analysis on the same. This process complicates obtaining medical data and
	can remove much useful information from the record. As part of a larger project
	involving ontology-driven medical processing, we employ a method of recognizing
	protected health information (PHI) that maps to ontological terms. We then use
	the relationships defined in the ontology to redact medical texts so that roles
	and semantics of terms are retained without compromising anonymity. The method
	is evaluated by clinical experts on several hundred medical documents,
	achieving up to a 98.8% f-score, and has already shown promise for retaining
	semantic information in later processing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>polsley-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2325">
    <title>Annotation of pain and anesthesia events for surgery-related processes and outcomes extraction</title>
    <author><first>Wen-wai</first><last>Yim</last></author>
    <author><first>Dario</first><last>Tedesco</last></author>
    <author><first>Catherine</first><last>Curtin</last></author>
    <author><first>Tina</first><last>Hernandez-Boussard</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>200&#8211;205</pages>
    <url>http://www.aclweb.org/anthology/W17-2325</url>
    <abstract>Pain and anesthesia information are crucial elements to identifying
	surgery-related processes and outcomes. However pain is not consistently
	recorded in the electronic medical record. Even when recorded, the rich complex
	granularity of the pain experience may be lost. Similarly, anesthesia
	information is recorded using local electronic collection systems; though the
	accuracy and completeness of the information is unknown. We propose an
	annotation schema to capture pain, pain management, and anesthesia event
	information.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yim-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2326">
    <title>Identifying Comparative Structures in Biomedical Text</title>
    <author><first>Samir</first><last>Gupta</last></author>
    <author><first>A.S.M. Ashique</first><last>Mahmood</last></author>
    <author><first>Karen</first><last>Ross</last></author>
    <author><first>Cathy</first><last>Wu</last></author>
    <author><first>K.</first><last>Vijay-Shanker</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>206&#8211;215</pages>
    <url>http://www.aclweb.org/anthology/W17-2326</url>
    <abstract>Comparison sentences are very commonly used by authors in biomedical literature
	to report results of experiments. In such comparisons, authors typically make
	observations under two different scenarios. In this paper, we present a system
	to automatically identify such comparative sentences and their components i.e.
	the compared entities, the scale of the comparison and the aspect on which the
	entities are being compared. Our methodology is based on dependencies obtained
	by applying a parser to extract a wide range of comparison structures. We
	evaluated our system for its effectiveness in identifying comparisons and their
	components. The system achieved a F-score of 0.87 for comparison sentence
	identification and 0.77-0.81 for identifying its components.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gupta-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2327">
    <title>Tagging Funding Agencies and Grants in Scientific Articles using Sequential Learning Models</title>
    <author><first>Subhradeep</first><last>Kayal</last></author>
    <author><first>Zubair</first><last>Afzal</last></author>
    <author><first>George</first><last>Tsatsaronis</last></author>
    <author><first>Sophia</first><last>Katrenko</last></author>
    <author><first>Pascal</first><last>Coupet</last></author>
    <author><first>Marius</first><last>Doornenbal</last></author>
    <author><first>Michelle</first><last>Gregory</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>216&#8211;221</pages>
    <url>http://www.aclweb.org/anthology/W17-2327</url>
    <abstract>In this paper we present a solution for tagging funding bodies and grants in
	scientific articles using a combination of trained sequential learning models,
	namely conditional random fields (CRF), hidden markov models (HMM) and maximum
	entropy models (MaxEnt), on a benchmark set created in-house. We apply the
	trained models to address the BioASQ challenge 5c, which is a newly introduced
	task that aims to solve the problem of funding information extraction from
	scientific articles. Results in the dry-run data set of BioASQ task 5c show
	that the suggested approach can achieve a micro-recall of more than 85% in
	tagging both funding bodies and grants.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kayal-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2328">
    <title>Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs</title>
    <author><first>Sunil</first><last>Mohan</last></author>
    <author><first>Nicolas</first><last>Fiorini</last></author>
    <author><first>Sun</first><last>Kim</last></author>
    <author><first>Zhiyong</first><last>Lu</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>222&#8211;231</pages>
    <url>http://www.aclweb.org/anthology/W17-2328</url>
    <abstract>We describe a Deep Learning approach to modeling the relevance of a document's
	text to a query, applied to biomedical literature. Instead of mapping each
	document and query to a common semantic space, we compute a variable-length
	difference vector between the query and document which is then passed through a
	deep convolution stage followed by a deep regression network to produce the
	estimated probability of the document's relevance to the query. Despite the
	small amount of training data, this approach produces a more robust predictor
	than computing similarities between semantic vector representations of the
	query and document, and also results in significant improvements over
	traditional IR text factors. In the future, we plan to explore its application
	in improving PubMed search.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mohan-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2329">
    <title>Detecting Dementia through Retrospective Analysis of Routine Blog Posts by Bloggers with Dementia</title>
    <author><first>Vaden</first><last>Masrani</last></author>
    <author><first>Gabriel</first><last>Murray</last></author>
    <author><first>Thalia</first><last>Field</last></author>
    <author><first>Giuseppe</first><last>Carenini</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>232&#8211;237</pages>
    <url>http://www.aclweb.org/anthology/W17-2329</url>
    <abstract>We investigate if writers with dementia can be automatically distinguished from
	those without by analyzing linguistic markers in written text, in the form of
	blog posts. We have built a corpus of several thousand blog posts, some by
	people with dementia and others by people with loved ones with dementia. We use
	this dataset to train and test several machine learning methods, and achieve
	prediction performance at a level far above the baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>masrani-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2330">
    <title>Protein Word Detection using Text Segmentation Techniques</title>
    <author><first>Devi</first><last>Ganesan</last></author>
    <author><first>Ashish V.</first><last>Tendulkar</last></author>
    <author><first>Sutanu</first><last>Chakraborti</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>238&#8211;246</pages>
    <url>http://www.aclweb.org/anthology/W17-2330</url>
    <abstract>Literature in Molecular Biology is abundant with linguistic metaphors. There
	have been works in the past that attempt to draw parallels between linguistics
	and biology, driven by the fundamental premise that proteins have a language of
	their own. Since word detection is crucial to the decipherment of any  unknown
	language, we attempt to establish a problem mapping from natural language text
	to protein sequences at the level of words. Towards this end, we explore the
	use of an unsupervised text segmentation algorithm to the task of extracting
	"biological words" from protein sequences. In particular, we demonstrate the
	effectiveness of using domain knowledge to complement data driven approaches in
	the text segmentation task, as well as in its biological counterpart. We also
	propose a novel extrinsic evaluation measure for protein words through protein
	family classification.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ganesan-tendulkar-chakraborti:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2331">
    <title>External Evaluation of Event Extraction Classifiers for Automatic Pathway Curation: An extended study of the mTOR pathway</title>
    <author><first>Wojciech</first><last>Kusa</last></author>
    <author><first>Michael</first><last>Spranger</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>247&#8211;256</pages>
    <url>http://www.aclweb.org/anthology/W17-2331</url>
    <abstract>This paper evaluates the impact of various event extraction systems on
	automatic pathway curation using the popular mTOR pathway. We quantify the
	impact of training data sets as well as different machine learning classifiers
	and show that some improve the quality of automatically extracted pathways.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kusa-spranger:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2332">
    <title>Toward Automated Early Sepsis Alerting: Identifying Infection Patients from Nursing Notes</title>
    <author><first>Emilia</first><last>Apostolova</last></author>
    <author><first>Tom</first><last>Velez</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>257&#8211;262</pages>
    <url>http://www.aclweb.org/anthology/W17-2332</url>
    <abstract>Severe sepsis and septic shock are conditions that affect millions of patients
	and have close to 50% mortality rate. Early identification of at-risk patients
	significantly improves outcomes. Electronic surveillance tools have been
	developed to monitor structured Electronic Medical Records and automatically
	recognize early signs of sepsis. However, many sepsis risk factors (e.g.
	symptoms and signs of infection) are often captured only in free text clinical
	notes. In this study, we developed a method for automatic monitoring of nursing
	notes for signs and symptoms of infection. We utilized a creative approach to
	automatically generate an annotated dataset. The dataset was used to create a
	Machine Learning model that achieved an F1-score ranging from 79 to 96%.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>apostolova-velez:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2333">
    <title>Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed</title>
    <author><first>Danchen</first><last>Zhang</last></author>
    <author><first>Daqing</first><last>He</last></author>
    <author><first>Sanqiang</first><last>Zhao</last></author>
    <author><first>Lei</first><last>Li</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>263&#8211;271</pages>
    <url>http://www.aclweb.org/anthology/W17-2333</url>
    <abstract>Assigning a standard ICD-9-CM code to disease symptoms in medical texts is an
	important task in the medical domain. Automating this process could greatly
	reduce the costs. However, the effectiveness of an automatic ICD-9-CM code
	classifier faces a serious problem, which can be triggered by unbalanced
	training data. Frequent diseases often have more training data, which helps its
	classification to perform better than that of an infrequent disease. However, a
	disease’s frequency does not necessarily reflect its importance. To resolve
	this training data shortage problem, we propose to strategically draw data from
	PubMed to enrich the training data when there is such need. We validate our
	method on the CMC dataset, and the evaluation results indicate that our method
	can significantly improve the code assignment classifiers' performance at the
	macro-averaging level.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2334">
    <title>Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation</title>
    <author><first>Sam</first><last>Henry</last></author>
    <author><first>Clint</first><last>Cuffy</last></author>
    <author><first>Bridget</first><last>McInnes</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>272&#8211;281</pages>
    <url>http://www.aclweb.org/anthology/W17-2334</url>
    <abstract>In this paper, we present an analysis of feature extraction methods via
	dimensionality reduction for the task of biomedical Word Sense Disambiguation
	(WSD). We modify the vector representations in the 2-MRD WSD algorithm, and
	evaluate four dimensionality reduction methods: Word Embeddings using
	Continuous Bag of Words and Skip Gram, Singular Value Decomposition (SVD), and
	Principal Component Analysis (PCA). We also evaluate the effects of vector size
	on the performance of each of these methods. Results are evaluated on five
	standard evaluation datasets (Abbrev.100, Abbrev.200, Abbrev.300, NLM-WSD, and
	MSH-WSD). We find that vector sizes of 100 are sufficient for all techniques
	except SVD, for which a vector size of 1500 is referred. We also show that SVD
	performs on par with Word Embeddings for all but one dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>henry-cuffy-mcinnes:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2335">
    <title>Investigating the Documentation of Electronic Cigarette Use in the Veteran Affairs Electronic Health Record: A Pilot Study</title>
    <author><first>Danielle</first><last>Mowery</last></author>
    <author><first>Brett</first><last>South</last></author>
    <author><first>Olga</first><last>Patterson</last></author>
    <author><first>Shu-Hong</first><last>Zhu</last></author>
    <author><first>Mike</first><last>Conway</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>282&#8211;286</pages>
    <url>http://www.aclweb.org/anthology/W17-2335</url>
    <abstract>In this paper, we present pilot work on characterising the
	documentation of electronic cigarettes (e-cigarettes) in the United
	States Veterans Administration Electronic Health Record.  The Veterans
	Health Administration is the largest health care system in the United States
	with 1,233 health care facilities nationwide, serving 8.9 million
	veterans per year.   We identified a random sample of 2000 Veterans
	Administration patients, coded as current tobacco users, from 2008 to
	2014.                       Using simple keyword matching techniques combined with
	qualitative analysis, we investigated the prevalence and distribution of
	e-cigarette terms in these clinical notes, discovering that for
	current smokers, 11.9% of  patient records contain an e-cigarette related
	term.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mowery-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2336">
    <title>Automated Preamble Detection in Dictated Medical Reports</title>
    <author><first>Wael</first><last>Salloum</last></author>
    <author><first>Greg</first><last>Finley</last></author>
    <author><first>Erik</first><last>Edwards</last></author>
    <author><first>Mark</first><last>Miller</last></author>
    <author><first>David</first><last>Suendermann-Oeft</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>287&#8211;295</pages>
    <url>http://www.aclweb.org/anthology/W17-2336</url>
    <abstract>Dictated medical reports very often feature
	a preamble containing metainformation
	about the report such as patient and
	physician names, location and name of the
	clinic, date of procedure, and so on. In the
	medical transcription process, the preamble
	is usually omitted from the final report,
	as it contains information already available
	in the electronic medical record. We
	present a method which is able to automatically
	identify preambles in medical dictations.
	The method makes use of stateof-
	the-art NLP techniques including word
	embeddings and Bi-LSTMs and achieves
	preamble detection performance superior
	to humans.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>salloum-EtAl:2017:BioNLP172</bibkey>
  </paper>

  <paper id="2337">
    <title>A Biomedical Question Answering System in BioASQ 2017</title>
    <author><first>Mourad</first><last>Sarrouti</last></author>
    <author><first>Said</first><last>Ouatik El Alaoui</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>296&#8211;301</pages>
    <url>http://www.aclweb.org/anthology/W17-2337</url>
    <abstract>Question answering, the identification of short accurate answers to users
	questions,
	is a longstanding challenge widely studied over the last decades in the open
	domain. However, it still requires further efforts in the biomedical domain. In
	this paper, we describe our participation in phase B of task 5b in the 2017
	BioASQ
	challenge using our biomedical question answering system. Our system, dealing
	with four types of questions (i.e., yes/no, factoid, list, and summary), is
	based on
	(1) a dictionary-based approach for generating the exact answers of yes/no
	questions, (2) UMLS metathesaurus and term frequency metric for extracting the
	exact answers of factoid and list questions, and (3) the BM25 model and UMLS
	concepts for retrieving the ideal answers (i.e., paragraph-sized summaries).
	Preliminary
	results show that our system achieves good and competitive results in both
	exact and
	ideal answers extraction tasks as compared with the participating systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sarrouti-ouatikelalaoui:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2338">
    <title>Adapting Pre-trained Word Embeddings For Use In Medical Coding</title>
    <author><first>Kevin</first><last>Patel</last></author>
    <author><first>Divya</first><last>Patel</last></author>
    <author><first>Mansi</first><last>Golakiya</last></author>
    <author><first>Pushpak</first><last>Bhattacharyya</last></author>
    <author><first>Nilesh</first><last>Birari</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>302&#8211;306</pages>
    <url>http://www.aclweb.org/anthology/W17-2338</url>
    <abstract>Word embeddings are a crucial component in modern NLP. Pre-trained embeddings
	released by different groups have been a major reason for their popularity.
	However, they are trained on generic corpora, which limits their direct use for
	domain specific tasks. In this paper, we propose a method to add task specific
	information to pre-trained word embeddings. Such information can improve their
	utility. We add information from medical coding data, as well as the first
	level from the hierarchy of ICD-10 medical code set to different pre-trained
	word embeddings. We adapt CBOW algorithm from the word2vec package for our
	purpose. We evaluated our approach on five different pre-trained word
	embeddings. Both the original word embeddings, and their modified versions (the
	ones with added information) were used for automated review of medical coding.
	The modified word embeddings give an improvement in f-score by 1% on the
	5-fold evaluation on a private medical claims dataset. Our results show that
	adding extra information is possible and beneficial for the task at hand.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>patel-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2339">
    <title>Initializing neural networks for hierarchical multi-label text classification</title>
    <author><first>Simon</first><last>Baker</last></author>
    <author><first>Anna</first><last>Korhonen</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>307&#8211;315</pages>
    <url>http://www.aclweb.org/anthology/W17-2339</url>
    <abstract>Many tasks in the biomedical domain require the assignment of one or more
	predefined labels to input text, where the labels are a part of a hierarchical
	structure (such as a taxonomy). The conventional approach is to use a
	one-vs.-rest (OVR) classification setup, where a binary classifier is trained
	for each label in the taxonomy or ontology where all instances not belonging to
	the class are considered negative examples. The main drawbacks to this approach
	are that dependencies between classes are not leveraged in the training and
	classification process, and the additional computational cost of training
	parallel classifiers. In this paper, we apply a new method for hierarchical
	multi-label text classification that initializes a neural network model final
	hidden layer such that it leverages label co-occurrence relations such as
	hypernymy. This approach elegantly lends itself to hierarchical classification.
	We evaluated this approach using two hierarchical multi-label text
	classification tasks in the biomedical domain using both sentence- and
	document-level classification. Our evaluation shows promising results for this
	approach.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>baker-korhonen:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2340">
    <title>Biomedical Event Trigger Identification Using Bidirectional Recurrent Neural Network Based Models</title>
    <author><first>Rahul</first><last>V S S Patchigolla</last></author>
    <author><first>Sunil</first><last>Sahu</last></author>
    <author><first>Ashish</first><last>Anand</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>316&#8211;321</pages>
    <url>http://www.aclweb.org/anthology/W17-2340</url>
    <abstract>Biomedical events describe complex interactions between various biomedical
	entities. Event trigger is a word or a phrase which typically signifies the
	occurrence of an event. Event trigger identification is an important first step
	in all event extraction methods. However many of the current approaches either
	rely on complex hand-crafted features or consider features only within a
	window. In this paper we propose a method that takes the advantage of recurrent
	neural network (RNN) to extract higher level features present across the
	sentence. Thus hidden state representation of RNN along with word and entity
	type embedding as features avoid relying on the complex hand-crafted features
	generated using various NLP toolkits. Our experiments have shown to achieve
	state-of-art F1-score on Multi Level Event Extraction (MLEE) corpus. We have
	also performed category-wise analysis of the result and discussed the
	importance of various features in trigger identification task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vsspatchigolla-sahu-anand:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2341">
    <title>Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks</title>
    <author><first>Chen</first><last>Lin</last></author>
    <author><first>Timothy</first><last>Miller</last></author>
    <author><first>Dmitriy</first><last>Dligach</last></author>
    <author><first>Steven</first><last>Bethard</last></author>
    <author><first>Guergana</first><last>Savova</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>322&#8211;327</pages>
    <url>http://www.aclweb.org/anthology/W17-2341</url>
    <abstract>Token sequences are often used as the input for Convolutional Neural Networks
	(CNNs) in natural language processing. However, they might not be an ideal
	representation for time expressions, which are long, highly varied, and
	semantically complex. We describe a method for representing time expressions
	with single pseudo-tokens for CNNs. With this method, we establish a new
	state-of-the-art result for a clinical temporal relation extraction task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lin-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2342">
    <title>Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods</title>
    <author><first>Sarvnaz</first><last>Karimi</last></author>
    <author><first>Xiang</first><last>Dai</last></author>
    <author><first>Hamedh</first><last>Hassanzadeh</last></author>
    <author><first>Anthony</first><last>Nguyen</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>328&#8211;332</pages>
    <url>http://www.aclweb.org/anthology/W17-2342</url>
    <abstract>Diagnosis autocoding services and research intend to both improve the
	productivity of clinical coders and the accuracy of the coding. It is an
	important step in data analysis for funding and reimbursement, as well as
	health services planning and resource allocation. We investigate the
	applicability of deep learning at autocoding of radiology reports using
	International Classification of Diseases (ICD). Deep learning methods are known
	to require large training data. Our goal is to explore how to use these methods
	when the training data is sparse, skewed and relatively small, and how their
	effectiveness compares to conventional methods. We identify optimal parameters
	that could be used in setting up a convolutional neural network for autocoding
	with comparable results to that of conventional methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>karimi-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2343">
    <title>Automatic classification of doctor-patient questions for a virtual patient record query task</title>
    <author><first>Leonardo</first><last>Campillos Llanos</last></author>
    <author><first>Sophie</first><last>Rosset</last></author>
    <author><first>Pierre</first><last>Zweigenbaum</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>333&#8211;341</pages>
    <url>http://www.aclweb.org/anthology/W17-2343</url>
    <abstract>We present the work-in-progress of automating the classification of
	doctor-patient questions in the context of a simulated consultation with a
	virtual patient. We classify questions according to the computational strategy
	(rule-based or other) needed for looking up data in the clinical record. We
	compare ‘traditional’ machine learning methods (Gaussian and Multinomial
	Naive Bayes, and Support Vector Machines) and a neural network classifier
	(FastText). We obtained the best results with the SVM using semantic
	annotations, whereas the neural classifier achieved promising results without
	it.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>campillosllanos-rosset-zweigenbaum:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2344">
    <title>Assessing the performance of Olelo, a real-time biomedical question answering application</title>
    <author><first>Mariana</first><last>Neves</last></author>
    <author><first>Fabian</first><last>Eckert</last></author>
    <author><first>Hendrik</first><last>Folkerts</last></author>
    <author><first>Matthias</first><last>Uflacker</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>342&#8211;350</pages>
    <url>http://www.aclweb.org/anthology/W17-2344</url>
    <attachment type="poster">W17-2344.Poster.pdf</attachment>
    <abstract>Question answering (QA) can support physicians and biomedical researchers to
	find answers to their questions in the scientific literature. Such systems
	process large collections of documents in real time and include many natural
	language processing (NLP) procedures. We recently developed Olelo, a QA system
	for biomedicine which includes various NLP components,
	such as question processing, document and passage retrieval, answer processing
	and multi-document summarization. In this work, we present an evaluation of our
	system on the the fifth BioASQ challenge. We participated with the current
	state of the application and with an extension based on semantic role labeling
	that we are currently investigating. In addition
	to the BioASQ evaluation, we compared our system to other on-line biomedical QA
	systems in terms of the response time and the quality of the answers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>neves-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2345">
    <title>Clinical Event Detection with Hybrid Neural Architecture</title>
    <author><first>Adyasha</first><last>Maharana</last></author>
    <author><first>Meliha</first><last>Yetisgen</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>351&#8211;355</pages>
    <url>http://www.aclweb.org/anthology/W17-2345</url>
    <abstract>Event detection from clinical notes has been traditionally solved with rule
	based and statistical natural language processing (NLP) approaches that
	require extensive domain knowledge and feature engineering. In this paper, we
	have explored the feasibility of approaching this task with recurrent neural
	networks, clinical word embeddings and introduced a hybrid architecture to
	improve detection for entities with smaller representation in the dataset. A
	comparative analysis is also done which reveals the complementary behavior of
	neural networks and conditional random fields in clinical entity detection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>maharana-yetisgen:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2346">
    <title>Extracting Personal Medical Events for User Timeline Construction using Minimal Supervision</title>
    <author><first>Aakanksha</first><last>Naik</last></author>
    <author><first>Chris</first><last>Bogart</last></author>
    <author><first>Carolyn</first><last>Rose</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>356&#8211;364</pages>
    <url>http://www.aclweb.org/anthology/W17-2346</url>
    <abstract>In this paper, we describe a system for automatic construction of user disease
	progression timelines from their posts in online support groups using minimal
	supervision. In recent years, several online support groups have been
	established which has led to a huge increase in the amount of patient-authored
	text available. Creating systems which can automatically extract important
	medical events and create disease progression timelines for users from such
	text can help in patient health monitoring as well as studying links between
	medical events and users' participation in support groups. Prior work in this
	domain has used manually constructed keyword sets to detect medical events. In
	this work, our aim is to perform medical event detection using minimal
	supervision in order to develop a more general timeline construction system.
	Our system achieves an accuracy of 55.17%, which is 92% of the performance
	achieved by a supervised baseline system.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>naik-bogart-rose:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2347">
    <title>Detecting mentions of pain and acute confusion in Finnish clinical text</title>
    <author><first>Hans</first><last>Moen</last></author>
    <author><first>Kai</first><last>Hakala</last></author>
    <author><first>Farrokh</first><last>Mehryary</last></author>
    <author><first>Laura-Maria</first><last>Peltonen</last></author>
    <author><first>Tapio</first><last>Salakoski</last></author>
    <author><first>Filip</first><last>Ginter</last></author>
    <author><first>Sanna</first><last>Salanter&#228;</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>365&#8211;372</pages>
    <url>http://www.aclweb.org/anthology/W17-2347</url>
    <abstract>We study and compare two different approaches to the task of automatic
	assignment of predefined classes to clinical free-text narratives. In the first
	approach this is treated as a traditional mention-level named-entity
	recognition task, while the second approach treats it as a sentence-level
	multi-label classification task. Performance comparison across these two
	approaches is conducted in the form of sentence-level evaluation and
	state-of-the-art methods for both approaches are evaluated. The experiments are
	done on two data sets consisting of Finnish clinical text, manually annotated
	with respect to the topics pain and acute confusion. Our results suggest that
	the mention-level named-entity recognition approach outperforms sentence-level
	classification overall, but the latter approach still manages to achieve the
	best prediction scores on several annotation classes.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>moen-EtAl:2017:BioNLP17</bibkey>
  </paper>

  <paper id="2348">
    <title>A Multi-strategy Query Processing Approach for Biomedical Question Answering: USTB_PRIR at BioASQ 2017 Task 5B</title>
    <author><first>Zan-Xia</first><last>Jin</last></author>
    <author><first>Bo-Wen</first><last>Zhang</last></author>
    <author><first>Fan</first><last>Fang</last></author>
    <author><first>Le-Le</first><last>Zhang</last></author>
    <author><first>Xu-Cheng</first><last>Yin</last></author>
    <booktitle>BioNLP 2017</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada,</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>373&#8211;380</pages>
    <url>http://www.aclweb.org/anthology/W17-2348</url>
    <abstract>This paper describes the participation of USTB_PRIR team in the 2017 BioASQ 5B
	on question answering, including document retrieval, snippet retrieval, and
	concept retrieval task. We introduce different multimodal query processing
	strategies to enrich query terms and assign different weights to them.
	Specifically, sequential dependence model (SDM), pseudo-relevance feedback
	(PRF), fielded sequential dependence model (FSDM) and Divergence from
	Randomness model (DFRM) are respectively performed on different fields of
	PubMed articles, sentences extracted from relevant articles, the five
	terminologies or ontologies (MeSH, GO, Jochem, Uniprot and DO) to achieve
	better search performances. Preliminary results show that our systems
	outperform others in the document and snippet retrieval task in the first two
	batches.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jin-EtAl:2017:BioNLP17</bibkey>
  </paper>

</volume>

