<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W18">
  <paper id="5600">
    <title>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</title>
    <editor>Alberto Lavelli</editor>
    <editor>Anne-Lyse Minard</editor>
    <editor>Fabio Rinaldi</editor>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W18-56</url>
    <bibtype>book</bibtype>
    <bibkey>LOUHI:2018</bibkey>
  </paper>

  <paper id="5601">
    <title>Detecting Diabetes Risk from Social Media Activity</title>
    <author><first>Dane</first><last>Bell</last></author>
    <author><first>Egoitz</first><last>Laparra</last></author>
    <author><first>Aditya</first><last>Kousik</last></author>
    <author><first>Terron</first><last>Ishihara</last></author>
    <author><first>Mihai</first><last>Surdeanu</last></author>
    <author><first>Stephen</first><last>Kobourov</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;11</pages>
    <url>http://www.aclweb.org/anthology/W18-5601</url>
    <abstract>This work is the first to explore the detection of individuals' risk of type 2 diabetes mellitus (T2DM) directly from their social media (Twitter) activity. Our approach extends a deep learning architecture with several contributions: following previous observations that language use differs by gender, it captures and uses gender information through domain adaptation; it captures recency of posts under the hypothesis that more recent posts are more representative of an individual’s current risk status; and, lastly, it demonstrates that in this scenario where activity factors are sparsely represented in the data, a bag-of-word neural network model using custom dictionaries of food and activity words performs better than other neural sequence models. Our best model, which incorporates all these contributions, achieves a risk-detection F1 of 41.9, considerably higher than the baseline rate (36.9).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bell-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5602">
    <title>Treatment Side Effect Prediction from Online User-generated Content</title>
    <author><first>Hoang</first><last>Nguyen</last></author>
    <author><first>Kazunari</first><last>Sugiyama</last></author>
    <author><first>Min-Yen</first><last>Kan</last></author>
    <author><first>Kishaloy</first><last>Halder</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>12&#8211;21</pages>
    <url>http://www.aclweb.org/anthology/W18-5602</url>
    <abstract>With Health 2.0, patients and caregivers increasingly seek information regarding possible drug side effects during their medical treatments in online health communities. These online communities are helpful platforms for non-professional medical opinions, yet pose risk of being unreliable in quality and insufficient in quantity to cover the wide range of potential drug reactions. Current approaches to analysing such user-generated content in online forums heavily rely on feature engineering of both documents and users, and often overlook the relationships between posts within a common discussion thread. Inspired by recent advancements, we propose a neural architecture that models the textual content of user-generated documents and user experiences in online communities to predict side effects during treatment. Experimental results show that our proposed architecture outperforms baseline models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nguyen-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5603">
    <title>Revisiting neural relation classification in clinical notes with external information</title>
    <author><first>Simon</first><last>Suster</last></author>
    <author><first>Madhumita</first><last>Sushil</last></author>
    <author><first>Walter</first><last>Daelemans</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>22&#8211;28</pages>
    <url>http://www.aclweb.org/anthology/W18-5603</url>
    <abstract>Recently, segment convolutional neural networks have been proposed for end-to-end relation extraction in the clinical domain, achieving results comparable to or outperforming the approaches with heavy manual feature engineering. In this paper, we analyze the errors made by the neural classifier based on confusion matrices, and then investigate three simple extensions to overcome its limitations. We find that including ontological association between drugs and problems, and data-induced association between medical concepts does not reliably improve the performance, but that large gains are obtained by the incorporation of semantic classes to capture relation triggers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>suster-sushil-daelemans:2018:LOUHI</bibkey>
  </paper>

  <paper id="5604">
    <title>Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data</title>
    <author><first>Mandeep</first><last>Kaur</last></author>
    <author><first>Diego</first><last>Molla</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>29&#8211;37</pages>
    <url>http://www.aclweb.org/anthology/W18-5604</url>
    <abstract>The automation of text summarisation of biomedical publications is a pressing need due to the plethora of information available on-line. This paper explores the impact of several supervised machine learning approaches for extracting multi-document summaries for given queries. In particular, we compare classification and regression approaches for query-based extractive summarisation using data provided by the BioASQ Challenge. We tackled the problem of annotating sentences for training classification systems and show that a simple annotation approach outperforms regression-based summarisation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kaur-molla:2018:LOUHI</bibkey>
  </paper>

  <paper id="5605">
    <title>Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition</title>
    <author><first>Zenan</first><last>Zhai</last></author>
    <author><first>Dat Quoc</first><last>Nguyen</last></author>
    <author><first>Karin</first><last>Verspoor</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>38&#8211;43</pages>
    <url>http://www.aclweb.org/anthology/W18-5605</url>
    <abstract>We compare the use of LSTM-based and CNN-based character-level word embeddings in BiLSTM-CRF models to approach chemical and disease named entity recognition (NER) tasks. Empirical results over the BioCreative V CDR corpus show that the use of either type of character-level word embeddings in conjunction with the BiLSTM-CRF models leads to comparable state-of-the-art performance. However, the models using CNN-based character-level word embeddings have a computational performance advantage, increasing training time over word-based models by 25% while the LSTM-based character-level word embeddings more than double the required training time.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhai-nguyen-verspoor:2018:LOUHI</bibkey>
  </paper>

  <paper id="5606">
    <title>Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy</title>
    <author><first>Lina M.</first><last>Rojas Barahona</last></author>
    <author><first>Bo-Hsiang</first><last>Tseng</last></author>
    <author><first>Yinpei</first><last>Dai</last></author>
    <author><first>Clare</first><last>Mansfield</last></author>
    <author><first>Osman</first><last>Ramadan</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <author><first>Michael</first><last>Crawford</last></author>
    <author><first>Milica</first><last>Gasic</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>44&#8211;54</pages>
    <url>http://www.aclweb.org/anthology/W18-5606</url>
    <abstract>In recent years, we have seen deep learning and distributed representations of words and sentences make impact on a number of natural language processing tasks, such as similarity, entailment and sentiment analysis. Here we introduce a new task: understanding of mental health concepts derived from Cognitive Behavioural Therapy (CBT). We define a mental health ontology based on the CBT principles, annotate a large corpus where this phenomena is exhibited and perform understanding using deep learning and distributed representations. Our results show that the performance of deep learning models combined with word embeddings or sentence embeddings significantly outperform non-deep-learning models in this difficult task. This understanding module will be an essential component of a statistical dialogue system delivering therapy.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rojasbarahona-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5607">
    <title>Investigating the Challenges of Temporal Relation Extraction from Clinical Text</title>
    <author><first>Diana</first><last>Galvan</last></author>
    <author><first>Naoaki</first><last>Okazaki</last></author>
    <author><first>Koji</first><last>Matsuda</last></author>
    <author><first>Kentaro</first><last>Inui</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>55&#8211;64</pages>
    <url>http://www.aclweb.org/anthology/W18-5607</url>
    <abstract>Temporal reasoning remains as an unsolved task for Natural Language Processing (NLP), particularly demonstrated in the clinical domain. The complexity of temporal representation in language is evident as results of the 2016 Clinical TempEval challenge show: the current state-of-the-art systems perform well in solving mention-identification tasks of event and time expression, but poorly in temporal relation extraction, showing a gap of around 0.25 point below human performance. We explore to adapt the tree-based LSTM-RNN model proposed by Miwa and Bansal (2016) to temporal relation extraction from clinical text, obtaining a five point improvement over the best 2016 Clinical TempEval system and two points over the state-of-the-art. We deliver a deep analysis of the results and discuss the next step towards human-like temporal reasoning.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>galvan-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5608">
    <title>De-identifying Free Text of Japanese Dummy Electronic Health Records</title>
    <author><first>Kohei</first><last>Kajiyama</last></author>
    <author><first>Hiromasa</first><last>Horiguchi</last></author>
    <author><first>Takashi</first><last>Okumura</last></author>
    <author><first>Mizuki</first><last>Morita</last></author>
    <author><first>Yoshinobu</first><last>Kano</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>65&#8211;70</pages>
    <url>http://www.aclweb.org/anthology/W18-5608</url>
    <abstract>A new law was established in Japan to promote utilization of EHRs for research and developments, while de-identification is required to use EHRs. However, studies of automatic anonymization in the healthcare domain is not active for Japanese language, no de-identification tool available in practical performance for Japanese medical domains, as far as we know. Previous works show that rule-based methods are still effective, while deep learning methods are reported to be better recently. In order to implement and evaluate an de-identification tool in a practical level, we implemented three methods, rule-based, CRF, and LSTM. We prepared three datasets of pseudo EHRs with de-identification tags manually annoated. These datasets are derived from shared task data to compare with previous works, and our new data to increase training data. Our result shows that our LSTM-based method is better and robust, which leads to our future work that plans to apply our system to actual de-identification tasks in hospitals.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kajiyama-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5609">
    <title>Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study</title>
    <author><first>Drahomira</first><last>Herrmannova</last></author>
    <author><first>Steven</first><last>Young</last></author>
    <author><first>Robert</first><last>Patton</last></author>
    <author><first>Christopher</first><last>Stahl</last></author>
    <author><first>Nicole</first><last>Kleinstreuer</last></author>
    <author><first>Mary</first><last>Wolfe</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>71&#8211;82</pages>
    <url>http://www.aclweb.org/anthology/W18-5609</url>
    <abstract>Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>herrmannova-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5610">
    <title>Identification of Parallel Sentences in Comparable Monolingual Corpora from Different Registers</title>
    <author><first>R&#233;mi</first><last>Cardon</last></author>
    <author><first>Natalia</first><last>Grabar</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>83&#8211;93</pages>
    <url>http://www.aclweb.org/anthology/W18-5610</url>
    <abstract>Parallel aligned sentences provide useful information for different NLP applications. </abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>cardon-grabar:2018:LOUHI</bibkey>
  </paper>

  <paper id="5611">
    <title>Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network</title>
    <author><first>Hans</first><last>Moen</last></author>
    <author><first>Kai</first><last>Hakala</last></author>
    <author><first>Laura-Maria</first><last>Peltonen</last></author>
    <author><first>Henry</first><last>Suhonen</last></author>
    <author><first>Petri</first><last>Loukasmäki</last></author>
    <author><first>Tapio</first><last>Salakoski</last></author>
    <author><first>Filip</first><last>Ginter</last></author>
    <author><first>Sanna</first><last>Salanterä</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>94&#8211;100</pages>
    <url>http://www.aclweb.org/anthology/W18-5611</url>
    <abstract>We present our initial evaluation of a prototype system designed to assist nurses in assigning subject headings to nursing narratives - written in the context of documenting patient care in hospitals. Currently nurses may need to memorize several hundred subject headings from standardized nursing terminologies when structuring and assigning the right section/subject headings to their text. Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards. We hypothesize that this could reduce the time and effort needed for nursing documentation in hospitals. A central component of the system is a text classification model based on a long short-term memory (LSTM) recurrent neural network architecture, trained on a large data set of nursing notes. A simple Web-based interface has been implemented for user interaction. To evaluate the system, three nurses write a set of artificial nursing shift notes in a fully unstructured narrative manner, without planning for or consider the use of sections and subject headings. These are then fed to the system which assigns subject headings to each sentence and then groups them into paragraphs. Manual evaluation is conducted by a group of nurses. The results show that about 70% of the sentences are assigned to correct subject headings. The nurses believe that such a system can be of great help in making nursing documentation in hospitals easier and less time consuming. Finally, various measures and approaches for improving the system are discussed.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>moen-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5612">
    <title>Automatically Detecting the Position and Type of Psychiatric Evaluation Report Sections</title>
    <author><first>Deya</first><last>Banisakher</last></author>
    <author><first>Naphtali</first><last>Rishe</last></author>
    <author><first>Mark A.</first><last>Finlayson</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>101&#8211;110</pages>
    <url>http://www.aclweb.org/anthology/W18-5612</url>
    <abstract>Psychiatric evaluation reports represent a rich and still mostly-untapped</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>banisakher-rishe-finlayson:2018:LOUHI</bibkey>
  </paper>

  <paper id="5613">
    <title>Iterative development of family history annotation guidelines using a synthetic corpus of clinical text</title>
    <author><first>Taraka</first><last>Rama</last></author>
    <author><first>Pål</first><last>Brekke</last></author>
    <author><first>Øystein</first><last>Nytrø</last></author>
    <author><first>Lilja</first><last>Øvrelid</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>111&#8211;121</pages>
    <url>http://www.aclweb.org/anthology/W18-5613</url>
    <abstract>In this article, we describe the development of annotation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rama-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5614">
    <title>CAS: French Corpus with Clinical Cases</title>
    <author><first>Natalia</first><last>Grabar</last></author>
    <author><first>Vincent</first><last>Claveau</last></author>
    <author><first>Cl&#233;ment</first><last>Dalloux</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>122&#8211;128</pages>
    <url>http://www.aclweb.org/anthology/W18-5614</url>
    <abstract>Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing these applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated and even impossible to access textual data representative of those produced in these areas. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French. We describe this corpus, currently containing over 397,000 word occurrences, and the existing linguistic and semantic annotations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>grabar-claveau-dalloux:2018:LOUHI</bibkey>
  </paper>

  <paper id="5615">
    <title>Analysis of Risk Factor Domains in Psychosis Patient Health Records</title>
    <author><first>Eben</first><last>Holderness</last></author>
    <author><first>Nicholas</first><last>Miller</last></author>
    <author><first>Kirsten</first><last>Bolton</last></author>
    <author><first>Philip</first><last>Cawkwell</last></author>
    <author><first>Marie</first><last>Meteer</last></author>
    <author><first>James</first><last>Pustejovsky</last></author>
    <author><first>Mei</first><last>Hua-Hall</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>129&#8211;138</pages>
    <url>http://www.aclweb.org/anthology/W18-5615</url>
    <abstract>Readmission after discharge from a hospital is disruptive and costly, regardless of the reason. However, it can be particularly problematic for psychiatric patients, so predicting which patients may be readmitted is critically important but also very difficult. Clinical narratives in psychiatric electronic health records (EHRs) span a wide range of topics and vocabulary; therefore, a psychiatric readmission prediction model must begin with a robust and interpretable topic extraction component. We created a data pipeline for using document vector similarity metrics to perform topic extraction on psychiatric EHR data in service of our long-term goal of creating a readmission risk classifier. We show initial results for our topic extraction model and identify additional features we will be incorporating in the future.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>holderness-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5616">
    <title>Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks</title>
    <author><first>Ivan</first><last>Girardi</last></author>
    <author><first>Pengfei</first><last>Ji</last></author>
    <author><first>An-phi</first><last>Nguyen</last></author>
    <author><first>Nora</first><last>Hollenstein</last></author>
    <author><first>Adam</first><last>Ivankay</last></author>
    <author><first>Lorenz</first><last>Kuhn</last></author>
    <author><first>Chiara</first><last>Marchiori</last></author>
    <author><first>Ce</first><last>Zhang</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>139&#8211;148</pages>
    <url>http://www.aclweb.org/anthology/W18-5616</url>
    <abstract>We present an operational component of a real-world patient triage system. Given a specific patient presentation, the system is able to assess the level of medical urgency and issue the most appropriate recommendation in terms of best point of care and time to treat. We use an attention-based convolutional neural network</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>girardi-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5617">
    <title>Syntax-based Transfer Learning for the Task of Biomedical Relation Extraction</title>
    <author><first>Joël</first><last>Legrand</last></author>
    <author><first>Yannick</first><last>Toussaint</last></author>
    <author><first>Chedy</first><last>Raïssi</last></author>
    <author><first>Adrien</first><last>Coulet</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>149&#8211;159</pages>
    <url>http://www.aclweb.org/anthology/W18-5617</url>
    <abstract>Transfer learning (TL) proposes to enhance machine learning performance on a problem, by reusing labeled data originally designed for a related problem. In particular, domain adaptation consists, for a specific task, in reusing training data developed for the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because those usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>legrand-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5618">
    <title>In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition</title>
    <author><first>Golnar</first><last>Sheikhshabbafghi</last></author>
    <author><first>Inanc</first><last>Birol</last></author>
    <author><first>Anoop</first><last>Sarkar</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>160&#8211;164</pages>
    <url>http://www.aclweb.org/anthology/W18-5618</url>
    <abstract>Rapidly expanding volume of publications in the biomedical domain makes it increasingly difficult for a timely evaluation of the latest literature. That, along with a push for automated evaluation of clinical reports, present opportunities for effective natural language processing methods. In this study we target the problem of named entity recognition, where texts are processed to annotate terms that are relevant for biomedical studies. Terms of interest in the domain include gene and protein names, and cell lines and types. Here we report on a pipeline built on Embeddings from Language Models (ELMo) and a deep learning package for natural language processing (AllenNLP). We trained context-aware token embeddings on a dataset of biomedical papers using ELMo, and incorporated these embeddings in the LSTM-CRF model used by AllenNLP for named entity recognition. We show these representations improve named entity recognition for different types of biomedical named entities. We also achieve a new state of the art in gene mention detection on the BioCreative II gene mention shared task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sheikhshabbafghi-birol-sarkar:2018:LOUHI</bibkey>
  </paper>

  <paper id="5619">
    <title>Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction</title>
    <author><first>Chen</first><last>Lin</last></author>
    <author><first>Timothy</first><last>Miller</last></author>
    <author><first>Dmitriy</first><last>Dligach</last></author>
    <author><first>Hadi</first><last>Amiri</last></author>
    <author><first>Steven</first><last>Bethard</last></author>
    <author><first>Guergana</first><last>Savova</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>165&#8211;176</pages>
    <url>http://www.aclweb.org/anthology/W18-5619</url>
    <abstract>Neural network models are oftentimes restricted by limited labeled instances and resort to advanced architectures and features for cutting edge performance. We propose to build a recurrent neural network with multiple semantically heterogeneous embeddings within a self-training framework. Our framework makes use of labeled, unlabeled, and social media data, operates on basic features, and is scalable and generalizable. With this method, we establish the state-of-the-art result for both in- and cross-domain for a clinical temporal relation extraction task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lin-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5620">
    <title>Listwise temporal ordering of events in clinical notes</title>
    <author><first>Serena</first><last>Jeblee</last></author>
    <author><first>Graeme</first><last>Hirst</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>177&#8211;182</pages>
    <url>http://www.aclweb.org/anthology/W18-5620</url>
    <abstract>We present metrics for listwise temporal ordering of events in clinical notes, as well as a baseline listwise temporal ranking model that generates a timeline of events that can be used in downstream medical natural language processing tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jeblee-hirst:2018:LOUHI</bibkey>
  </paper>

  <paper id="5621">
    <title>Time Expressions in Mental Health Records for Symptom Onset Extraction</title>
    <author><first>Natalia</first><last>Viani</last></author>
    <author><first>Lucia</first><last>Yin</last></author>
    <author><first>Joyce</first><last>Kam</last></author>
    <author><first>Ayunni</first><last>Alawi</last></author>
    <author><first>Andr&#233;</first><last>Bittar</last></author>
    <author><first>Rina</first><last>Dutta</last></author>
    <author><first>Rashmi</first><last>Patel</last></author>
    <author><first>Robert</first><last>Stewart</last></author>
    <author><first>Sumithra</first><last>Velupillai</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>183&#8211;192</pages>
    <url>http://www.aclweb.org/anthology/W18-5621</url>
    <abstract>For psychiatric disorders such as schizophrenia, longer durations of untreated psychosis are associated with worse intervention outcomes. Data included in electronic health records (EHRs) can be useful for retrospective clinical studies, but much of this is stored as unstructured text which cannot be directly used in computation. Natural Language Processing (NLP) methods can be used to extract this data, in order to identify symptoms and treatments from mental health records, and temporally anchor the first emergence of these. We are developing an EHR corpus annotated with time expressions, clinical entities and their relations, to be used for NLP development.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>viani-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5622">
    <title>Evaluation of a Sequence Tagging Tool for Biomedical Texts</title>
    <author><first>Julien</first><last>Tourille</last></author>
    <author><first>Matthieu</first><last>Doutreligne</last></author>
    <author><first>Olivier</first><last>Ferret</last></author>
    <author><first>Aur&#233;lie</first><last>N&#233;v&#233;ol</last></author>
    <author><first>Nicolas</first><last>Paris</last></author>
    <author><first>Xavier</first><last>Tannier</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>193&#8211;203</pages>
    <url>http://www.aclweb.org/anthology/W18-5622</url>
    <abstract>Many applications in biomedical natural language processing rely on sequence tagging as an initial step to perform more complex analysis. To support text analysis in the biomedical domain, we introduce Yet Another SEquence Tagger (YASET), an open-source multi purpose sequence tagger that implements state-of-the-art deep learning algorithms for sequence tagging. Herein, we evaluate YASET on part-of-speech tagging and named entity recognition in a variety of text genres including articles from the biomedical literature in English and clinical narratives in French. To further characterize performance, we report distributions over 30 runs and different sizes of training datasets. YASET provides state-of-the-art performance on the CoNLL 2003 NER dataset (F1=0.87), MEDPOST corpus (F1=0.97), MERLoT corpus (F1=0.99) and NCBI disease corpus (F1=0.81). We believe that YASET is a versatile and efficient tool that can be used for sequence tagging in biomedical and clinical texts.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tourille-EtAl:2018:LOUHI</bibkey>
  </paper>

  <paper id="5623">
    <title>Learning to Summarize Radiology Findings</title>
    <author><first>Yuhao</first><last>Zhang</last></author>
    <author><first>Daisy Yi</first><last>Ding</last></author>
    <author><first>Tianpei</first><last>Qian</last></author>
    <author><first>Christopher D.</first><last>Manning</last></author>
    <author><first>Curtis P.</first><last>Langlotz</last></author>
    <booktitle>Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis</booktitle>
    <month>October</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>204&#8211;213</pages>
    <url>http://www.aclweb.org/anthology/W18-5623</url>
    <abstract>The Impression section of a radiology report summarizes crucial radiology findings in natural language and plays a central role in communicating these findings to physicians. However, the process of generating impressions by summarizing findings is time-consuming for radiologists and prone to errors. We propose to automate the generation of radiology impressions with neural sequence-to-sequence learning. We further propose a customized neural model for this task which learns to encode the study background information and use this information to guide the decoding process. On a large dataset of radiology reports collected from actual hospital studies, our model outperforms existing non-neural and neural baselines under the ROUGE metrics. In a blind experiment, a board-certified radiologist indicated that 67% of sampled system summaries are at least as good as the corresponding human-written summaries, suggesting significant clinical validity. To our knowledge our work represents the first attempt in this direction.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-EtAl:2018:LOUHI</bibkey>
  </paper>

</volume>

