<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W16">
  <paper id="4200">
    <title>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</title>
    <editor>Anna Rumshisky</editor>
    <editor>Kirk Roberts</editor>
    <editor>Steven Bethard</editor>
    <editor>Tristan Naumann</editor>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <url>http://aclweb.org/anthology/W16-42</url>
    <bibtype>book</bibtype>
    <bibkey>ClinicalNLP:2016</bibkey>
  </paper>

  <paper id="4201">
    <title>The impact of simple feature engineering in multilingual medical NER</title>
    <author><first>Rebecka</first><last>Weegar</last></author>
    <author><first>Arantza</first><last>Casillas</last></author>
    <author><first>Arantza</first><last>Diaz de Ilarraza</last></author>
    <author><first>Maite</first><last>Oronoz</last></author>
    <author><first>Alicia</first><last>P&#233;rez</last></author>
    <author><first>Koldo</first><last>Gojenola</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>1&#8211;6</pages>
    <url>http://aclweb.org/anthology/W16-4201</url>
    <abstract>The goal of this paper is to examine the impact of simple feature engineering
	mechanisms before applying more sophisticated techniques to the task of medical
	NER. Sometimes papers using scientifically sound techniques present raw
	baselines that could be improved adding simple and cheap features. This work
	focuses on entity recognition for the clinical domain for three languages:
	English, Swedish and Spanish. The task is tackled using simple features,
	starting from the window size, capitalization, prefixes, and moving to POS and
	semantic tags. This work demonstrates that a simple initial step of feature
	engineering can improve the baseline results significantly. Hence, the
	contributions of this paper are: first, a short list of guidelines well
	supported with experimental results on three languages and, second, a detailed
	description of the relevance of these features for medical NER.
	Author{1}{Affiliation}},
  url       = {http://aclweb.org/anthology/W16-4201}
}
</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>weegar-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4202">
    <title>Bidirectional LSTM-CRF for Clinical Concept Extraction</title>
    <author><first>Raghavendra</first><last>Chalapathy</last></author>
    <author><first>Ehsan</first><last>Zare Borzeshi</last></author>
    <author><first>Massimo</first><last>Piccardi</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>7&#8211;12</pages>
    <url>http://aclweb.org/anthology/W16-4202</url>
    <abstract>Automated extraction of concepts from patient clinical records is an essential
	facilitator of clinical research. For this reason, the 2010 i2b2/VA Natural
	Language
	Processing Challenges for Clinical Records introduced a concept extraction task
	aimed at identifying and classifying concepts into predefined categories (i.e.,
	treatments, tests and problems). State-of-the-art concept extraction approaches
	heavily rely on handcrafted features and domain-specific resources which are
	hard to collect and define. For this reason, this paper proposes an
	alternative, streamlined approach: a recurrent neural network (the
	bidirectional LSTM with CRF decoding) initialized with general-purpose,
	off-the-shelf word embeddings. The experimental results achieved on the 2010
	i2b2/VA reference corpora using the proposed framework outperform all recent
	methods and ranks closely to the best submission from the original 2010 i2b2/VA
	challenge.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chalapathy-zareborzeshi-piccardi:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4203">
    <title>MedNLPDoc: Japanese Shared Task for Clinical NLP</title>
    <author><first>Eiji</first><last>Aramaki</last></author>
    <author><first>Yoshinobu</first><last>Kano</last></author>
    <author><first>Tomoko</first><last>Ohkuma</last></author>
    <author><first>Mizuki</first><last>Morita</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>13&#8211;16</pages>
    <url>http://aclweb.org/anthology/W16-4203</url>
    <abstract>Due to the recent replacements of physical documents with electronic medical
	records (EMR), the importance of information processing in medical fields has
	been increased. We have been organizing the MedNLP task series in NTCIR-10 and
	11. These workshops were the first shared tasks which attempt to evaluate
	technologies that retrieve important information from medical reports written
	in Japanese. In this report, we describe the NTCIR-12 MedNLPDoc task which is
	designed for more advanced and practical use for the medical fields. This task
	is considered as a multi-labeling task to a patient record. This report
	presents results of the shared task, discusses and illustrates remained issues
	in the medical natural language processing field.
	Author{4}{Affiliation}},
  url       = {http://aclweb.org/anthology/W16-4203}
}
</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>aramaki-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4204">
    <title>Feature-Augmented Neural Networks for Patient Note De-identification</title>
    <author><first>Ji Young</first><last>Lee</last></author>
    <author><first>Franck</first><last>Dernoncourt</last></author>
    <author><first>Ozlem</first><last>Uzuner</last></author>
    <author><first>Peter</first><last>Szolovits</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>17&#8211;22</pages>
    <url>http://aclweb.org/anthology/W16-4204</url>
    <abstract>Patient notes contain a wealth of information of potentially great interest to
	medical investigators. However, to protect patients' privacy, Protected Health
	Information (PHI) must be removed from the patient notes before they can be
	legally released, a process known as patient note de-identification. The main
	objective for a de-identification system is to have the highest possible
	recall. Recently, the first neural-network-based de-identification system has
	been proposed, yielding state-of-the-art results. Unlike other systems, it does
	not rely on human-engineered features, which allows it to be quickly deployed,
	but does not leverage knowledge from human experts or from electronic health
	records (EHRs). In this work, we explore a method to incorporate
	human-engineered features as well as features derived from EHRs to a
	neural-network-based de-identification system. Our results show that the
	addition of features, especially the EHR-derived features, further improves the
	state-of-the-art in patient note de-identification, including for some of the
	most sensitive PHI types such as patient names. Since in a real-life setting
	patient notes typically come with EHRs, we recommend developers of
	de-identification systems to leverage the information EHRs contain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lee-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4205">
    <title>Semi-supervised Clustering of Medical Text</title>
    <author><first>Pracheta</first><last>Sahoo</last></author>
    <author><first>Asif</first><last>Ekbal</last></author>
    <author><first>Sriparna</first><last>Saha</last></author>
    <author><first>Diego</first><last>Molla</last></author>
    <author><first>Kaushik</first><last>Nandan</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>23&#8211;31</pages>
    <url>http://aclweb.org/anthology/W16-4205</url>
    <abstract>Semi-supervised clustering is an attractive alternative for traditional
	(unsupervised) clustering in targeted applications. By using the information of
	a small annotated dataset, semi-supervised clustering can produce clusters that
	are customized to the application domain. In this paper, we
	present a semi-supervised clustering technique based on a multi-objective
	evolutionary algorithm (NSGA-II-clus). We apply this technique to the task of
	clustering medical publications for Evidence Based Medicine (EBM) and observe
	an improvement of the results against unsupervised
	and other semi-supervised clustering techniques.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sahoo-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4206">
    <title>Deep Learning Architecture for Patient Data De-identification in Clinical Records</title>
    <author><first>Shweta</first><last>Yadav</last></author>
    <author><first>Asif</first><last>Ekbal</last></author>
    <author><first>Sriparna</first><last>Saha</last></author>
    <author><first>Pushpak</first><last>Bhattacharyya</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>32&#8211;41</pages>
    <url>http://aclweb.org/anthology/W16-4206</url>
    <abstract>Rapid growth in Electronic Medical Records (EMR) has emerged to an expansion of
	data in the
	clinical domain. The majority of the available health care information is
	sealed in the form of narrative
	documents which form the rich source of clinical information. Text mining of
	such clinical
	records has gained huge attention in various medical applications like
	treatment and decision making.
	However, medical records enclose patient Private Health Information (PHI) which
	can
	reveal the identities of the patients. In order to retain the privacy of
	patients, it is mandatory to remove
	all the PHI information prior to making it publicly available. The aim is to
	de-identify or
	encrypt the PHI from the patient medical records. In this paper, we propose an
	algorithm based
	on deep learning architecture to solve this problem. We perform
	de-identification of seven PHI
	terms from the clinical records. Experiments on benchmark datasets show that
	our proposed
	approach achieves encouraging performance, which is better than the baseline
	model developed
	with Conditional Random Field.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yadav-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4207">
    <title>Neural Clinical Paraphrase Generation with Attention</title>
    <author><first>Sadid A.</first><last>Hasan</last></author>
    <author><first>Bo</first><last>Liu</last></author>
    <author><first>Joey</first><last>Liu</last></author>
    <author><first>Ashequl</first><last>Qadir</last></author>
    <author><first>Kathy</first><last>Lee</last></author>
    <author><first>Vivek</first><last>Datla</last></author>
    <author><first>Aaditya</first><last>Prakash</last></author>
    <author><first>Oladimeji</first><last>Farri</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>42&#8211;53</pages>
    <url>http://aclweb.org/anthology/W16-4207</url>
    <abstract>Paraphrase generation is important in various applications such as search,
	summarization, and question answering due to its ability to generate textual
	alternatives while keeping the overall meaning intact. Clinical paraphrase
	generation is especially vital in building patient-centric clinical decision
	support (CDS) applications where users are able to understand complex clinical
	jargons via easily comprehensible alternative paraphrases. This paper presents
	Neural Clinical Paraphrase Generation (NCPG), a novel approach that casts the
	task as a monolingual neural machine translation (NMT) problem. We propose an
	end-to-end neural network built on an attention-based bidirectional Recurrent
	Neural Network (RNN) architecture with an encoder-decoder framework to perform
	the task. Conventional bilingual NMT models mostly rely on word-level modeling
	and are often limited by out-of-vocabulary (OOV) issues. In contrast, we
	represent the source and target paraphrase pairs as character sequences to
	address this limitation. To the best of our knowledge, this is the first work
	that uses attention-based RNNs for clinical paraphrase generation and also
	proposes an end-to-end character-level modeling for this task. Extensive
	experiments on a large curated clinical paraphrase corpus show that the
	attention-based NCPG models achieve improvements of up to 5.2 BLEU points and
	0.5 METEOR points over a non-attention based strong baseline for word-level
	modeling, whereas further gains of up to 6.1 BLEU points and 1.3 METEOR points
	are obtained by the character-level NCPG models over their word-level
	counterparts. Overall, our models demonstrate comparable performance relative
	to the state-of-the-art phrase-based non-neural models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hasan-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4208">
    <title>Assessing the Corpus Size vs. Similarity Trade-off for Word Embeddings in Clinical NLP</title>
    <author><first>Kirk</first><last>Roberts</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>54&#8211;63</pages>
    <url>http://aclweb.org/anthology/W16-4208</url>
    <abstract>The proliferation of deep learning methods in natural language processing (NLP)
	and the large amounts of data they often require stands in stark contrast to
	the relatively data-poor clinical NLP domain. In particular, large text corpora
	are necessary to build high-quality word embeddings, yet often large corpora
	that are suitably representative of the target clinical data are unavailable. 
	This forces a choice between building embeddings from small clinical corpora
	and less representative, larger corpora. This paper explores this trade-off, as
	well as intermediate compromise solutions. Two standard clinical NLP tasks (the
	i2b2 2010 concept and assertion tasks) are evaluated with commonly used deep
	learning models (recurrent neural networks and convolutional neural networks)
	using a set of six corpora ranging from the target i2b2 data to large
	open-domain datasets. While combinations of corpora are generally found to work
	best, the single-best corpus is generally task-dependent.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>roberts:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4209">
    <title>Inference of ICD Codes from Japanese Medical Records by Searching Disease Names</title>
    <author><first>Masahito</first><last>Sakishita</last></author>
    <author><first>Yoshinobu</first><last>Kano</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>64&#8211;68</pages>
    <url>http://aclweb.org/anthology/W16-4209</url>
    <abstract>Importance of utilizing medical information is getting increased as electronic
	health records (EHRs) are widely used nowadays. We aim to assign international
	standardized disease codes, ICD-10, to Japanese textual information in EHRs for
	users to reuse the information accurately. In this paper, we propose methods to
	automatically extract diagnosis and to assign ICD codes to Japanese medical
	records. Due to the lack of available training data, we dare employed
	rule-based methods rather than machine learning. We observed characteristics of
	medical records carefully, writing rules to make effective methods by hand. We
	applied our system to the NTCIR-12 MedNLPDoc shared task data where
	participants are required to assign ICD-10 codes of possible diagnosis in given
	EHRs. In this shared task, our system achieved the highest F-measure score
	among all participants in the most severe evaluation criteria. Through
	comparison with other approaches, we show that our approach could be a useful
	milestone for the future development of Japanese medical record processing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sakishita-kano:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4210">
    <title>A fine-grained corpus annotation schema of German nephrology records</title>
    <author><first>Roland</first><last>Roller</last></author>
    <author><first>Hans</first><last>Uszkoreit</last></author>
    <author><first>Feiyu</first><last>Xu</last></author>
    <author><first>Laura</first><last>Seiffe</last></author>
    <author><first>Michael</first><last>Mikhailov</last></author>
    <author><first>Oliver</first><last>Staeck</last></author>
    <author><first>Klemens</first><last>Budde</last></author>
    <author><first>Fabian</first><last>Halleck</last></author>
    <author><first>Danilo</first><last>Schmidt</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>69&#8211;77</pages>
    <url>http://aclweb.org/anthology/W16-4210</url>
    <abstract>In this work we present a fine-grained annotation schema to detect named
	entities in German clinical data of chronically ill patients with kidney
	diseases. The annotation schema is driven by the needs of our clinical partners
	and the linguistic aspects of German language. In order to generate annotations
	within a short period, the work also presents a semi-automatic annotation which
	uses additional sources of knowledge such as UMLS, to pre-annotate concepts in
	advance. The presented schema will be used to apply novel techniques from
	natural language processing and machine learning to support doctors treating
	their patients by improved information access from unstructured German texts.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>roller-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4211">
    <title>Detecting Japanese Patients with Alzheimer’s Disease based on Word Category Frequencies</title>
    <author><first>Daisaku</first><last>Shibata</last></author>
    <author><first>Shoko</first><last>Wakamiya</last></author>
    <author><first>Ayae</first><last>Kinoshita</last></author>
    <author><first>Eiji</first><last>Aramaki</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>78&#8211;85</pages>
    <url>http://aclweb.org/anthology/W16-4211</url>
    <abstract>In recent years, detecting Alzheimer disease (AD) in early stages based on
	natural language processing (NLP) has drawn much attention. To date, vocabulary
	size, grammatical complexity, and fluency have been studied using NLP metrics.
	However, the content analysis of AD narratives is still unreachable for NLP.
	This study investigates features of the words that AD patients use in their
	spoken language. After recruiting 18 examinees of 53--90 years old (mean:
	76.89), they were divided into two groups based on MMSE scores. The AD group
	comprised 9 examinees with scores of 21 or lower. The healthy control group
	comprised 9 examinees with a score of 22 or higher. Linguistic Inquiry and Word
	Count (LIWC) classified words were used to categorize the words that the
	examinees used. The word frequency was found from observation. Significant
	differences were confirmed for the usage of impersonal pronouns in the AD
	group. This result demonstrated the basic feasibility of the proposed NLP-based
	detection
	approach.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shibata-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4212">
    <title>Prediction of Key Patient Outcome from Sentence and Word of Medical Text Records</title>
    <author><first>Takanori</first><last>Yamashita</last></author>
    <author><first>Yoshifumi</first><last>Wakata</last></author>
    <author><first>Hidehisa</first><last>Soejima</last></author>
    <author><first>Naoki</first><last>Nakashima</last></author>
    <author><first>Sachio</first><last>Hirokawa</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>86&#8211;90</pages>
    <url>http://aclweb.org/anthology/W16-4212</url>
    <abstract>The number of unstructured medical records kept in hospital information systems
	is increasing.
	The conditions of patients are formulated as outcomes in clinical pathway.
	A variance of an outcome describes deviations from standards of care like a
	patient's bad condition. 
	The present paper applied text mining to extract feature words and phrases of
	the variance from admission records.
	We report the cases the variances of ``pain control'' and ``no neuropathy
	worsening'' in cerebral infarction.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yamashita-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4213">
    <title>Unsupervised Abbreviation Detection in Clinical Narratives</title>
    <author><first>Markus</first><last>Kreuzthaler</last></author>
    <author><first>Michel</first><last>Oleynik</last></author>
    <author><first>Alexander</first><last>Avian</last></author>
    <author><first>Stefan</first><last>Schulz</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>91&#8211;98</pages>
    <url>http://aclweb.org/anthology/W16-4213</url>
    <abstract>Clinical narratives in electronic health record systems are a rich resource of
	patient-based information. They constitute an ongoing challenge for natural
	language processing, due to their high compactness and abundance of short
	forms. German medical texts exhibit numerous ad-hoc abbreviations that
	terminate with a period character. The disambiguation of period characters is
	therefore an important task for sentence and abbreviation detection. This task
	is addressed by a combination of co-occurrence information of word types with
	trailing period characters, a large domain dictionary, and a simple rule
	engine, thus merging statistical and dictionary-based disambiguation
	strategies. An F-measure of 0.95 could be reached by using the unsupervised
	approach presented in this paper. The results are promising for a
	domain-independent abbreviation detection strategy, because our approach avoids
	retraining of models or use case specific feature engineering efforts required
	for supervised machine learning approaches.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kreuzthaler-EtAl:2016:ClinicalNLP</bibkey>
  </paper>

  <paper id="4214">
    <title>Automated Anonymization as Spelling Variant Detection</title>
    <author><first>Steven Kester</first><last>Yuwono</last></author>
    <author><first>Hwee Tou</first><last>Ng</last></author>
    <author><first>Kee Yuan</first><last>Ngiam</last></author>
    <booktitle>Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>99&#8211;103</pages>
    <url>http://aclweb.org/anthology/W16-4214</url>
    <abstract>The issue of privacy has always been a concern when clinical texts are used for
	research purposes. Personal health information (PHI) (such as name and
	identification number) needs to be removed so that patients cannot be
	identified. Manual anonymization is not feasible due to the large number of
	clinical texts to be anonymized. In this paper, we tackle the task of
	anonymizing clinical texts written in sentence fragments and which frequently
	contain symbols, abbreviations, and misspelled words. Our clinical texts
	therefore differ from those in the i2b2 shared tasks which are in prose form
	with complete sentences. Our clinical texts are also part of a structured
	database which contains patient name and identification number in structured
	fields. As such, we formulate our anonymization task as spelling variant
	detection, exploiting patients' personal information in the structured fields
	to detect their spelling variants in clinical texts. We successfully anonymized
	clinical texts consisting of more than 200 million words, using minimum edit
	distance and regular expression patterns.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yuwono-ng-ngiam:2016:ClinicalNLP</bibkey>
  </paper>

</volume>

