<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W16">
  <paper id="4700">
    <title>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</title>
    <editor>Patrick Drouin</editor>
    <editor>Natalia Grabar</editor>
    <editor>Thierry Hamon</editor>
    <editor>Kyo Kageura</editor>
    <editor>Koichi Takeuchi</editor>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <url>http://aclweb.org/anthology/W16-47</url>
    <bibtype>book</bibtype>
    <bibkey>Computerm2016:2016</bibkey>
  </paper>

  <paper id="4701">
    <title>Analyzing Impact, Trend, and Diffusion of Knowledge associated with Neoplasms Research</title>
    <author><first>Min</first><last>Song</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>1</pages>
    <url>http://aclweb.org/anthology/W16-4701</url>
    <abstract>Cancer (a.k.a neoplasms in a broader sense) is one of the leading causes of
	death worldwide and its incidence is expected to exacerbate. To respond to the
	critical need from the society, there have been rigorous attempts for the
	cancer research community to develop treatment for cancer. Accordingly, we
	observe a surge in the sheer volume of research products and outcomes in
	relation to neoplasms.
	In this talk, we introduce the notion of entitymetrics to provide a new lens
	for understanding the impact, trend, and diffusion of knowledge associated with
	neoplasms research. To this end, we collected over two million records from
	PubMed, the most popular search engine in the medical domain. Coupled with text
	mining techniques including named entity recognition, sentence boundary
	detection, string approximate matching, entitymetrics enables us to analyze
	knowledge diffusion, impact, and trend at various knowledge entity units, such
	as bio-entity, organization, and country.
	At the end of the talk, the future applications and possible directions of
	entitymetrics will be discussed.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>song:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4702">
    <title>Local-Global Vectors to Improve Unigram Terminology Extraction</title>
    <author><first>Ehsan</first><last>Amjadian</last></author>
    <author><first>Diana</first><last>Inkpen</last></author>
    <author><first>Tahereh</first><last>Paribakht</last></author>
    <author><first>Farahnaz</first><last>Faez</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>2&#8211;11</pages>
    <url>http://aclweb.org/anthology/W16-4702</url>
    <abstract>The present paper explores a novel method that integrates efficient distributed
	representations with terminology extraction. We show that the information from
	a small number of observed instances can be combined with local and global word
	embeddings to remarkably improve the term extraction results on unigram terms.
	To do so we pass the terms extracted by other tools to a filter made of the
	local-global embeddings and a classifier which in turn decides whether or not a
	term candidate is a term. The filter can also be used as a hub to merge
	different term extraction tools into a single higher-performing system. We
	compare filters that use the skip-gram architecture and filters that employ the
	CBOW architecture for the task at hand.
	Author{4}{Affiliation}},
  url       = {http://aclweb.org/anthology/W16-4702}
}
</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>amjadian-EtAl:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4703">
    <title>Recognition of non-domain phrases in automatically extracted lists of terms</title>
    <author><first>Agnieszka</first><last>Mykowiecka</last></author>
    <author><first>Malgorzata</first><last>Marciniak</last></author>
    <author><first>Piotr</first><last>Rychlik</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>12&#8211;20</pages>
    <url>http://aclweb.org/anthology/W16-4703</url>
    <abstract>In the paper, we address the problem of recognition of non-domain phrases in
	terminology lists obtained with an automatic term extraction tool. We focus on
	identification of multi-word phrases that are general terms and discourse
	function expressions. We tested several methods based on domain corpora
	comparison and a method based on contexts of phrases identified in a large
	corpus of general language. We compared the results of the methods to manual
	annotation. The results show that the task is quite hard as the inter-annotator
	agreement is low. Several tested methods achieved similar overall results,
	although the phrase ordering varied between methods. The most successful method
	with the precision about 0.75 at the half of the tested list was the context
	based method using a modified contextual diversity coefficient.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mykowiecka-marciniak-rychlik:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4704">
    <title>Contextual term equivalent search using domain-driven disambiguation</title>
    <author><first>Caroline</first><last>Barriere</last></author>
    <author><first>Pierre Andr&#233;</first><last>M&#233;nard</last></author>
    <author><first>Daphn&#233;e</first><last>Azoulay</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>21&#8211;29</pages>
    <url>http://aclweb.org/anthology/W16-4704</url>
    <abstract>This article presents a domain-driven algorithm for the task of term sense
	disambiguation (TSD).  TSD aims at automatically choosing which term record
	from a term bank best represents the meaning of a term occurring in a
	particular context.  In a translation environment, finding the contextually
	appropriate term record is necessary to access the proper equivalent to be used
	in the target language text. The term bank TERMIUM Plus, recently published as
	an open access repository, is chosen as a domain-rich resource for testing our
	TSD algorithm, using English and French as source and target languages.  We
	devise an experiment using over 1300 English terms found in scientific
	articles, and show that our domain-driven TSD algorithm is able to bring the
	best term record, and therefore the best French equivalent, at the average rank
	of 1.69 compared to a baseline random rank of 3.51.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>barriere-menard-azoulay:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4705">
    <title>A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies</title>
    <author><first>Miki</first><last>Iwai</last></author>
    <author><first>Koichi</first><last>Takeuchi</last></author>
    <author><first>Kyo</first><last>Kageura</last></author>
    <author><first>Kazuya</first><last>Ishibashi</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>30&#8211;40</pages>
    <url>http://aclweb.org/anthology/W16-4705</url>
    <abstract>In this paper, we propose a method of augmenting existing bilingual
	terminologies. Our method belongs to a "generate and validate" framework rather
	than extraction from corpora. 
	Although many studies have proposed methods to find term translations or to
	augment terminology within a "generate and validate" framework, few has taken
	full advantage of the systematic nature of terminologies.
	A terminology of a domain represents the conceptual system of the domain fairly
	systematically, and we contend that making use of the systematicity fully will
	greatly contribute to the effective augmentation of terminologies. This paper
	proposes and evaluates a novel method to generate bilingual term candidates by
	using existing terminologies and delving into their systematicity. Experiments
	have shown that our method can generate 
	much better term candidate pairs than the existing method and give improved
	performance for terminology augmentation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>iwai-EtAl:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4706">
    <title>Acquisition of semantic relations between terms: how far can we get with standard NLP tools?</title>
    <author><first>Ina</first><last>Roesiger</last></author>
    <author><first>Julia</first><last>Bettinger</last></author>
    <author><first>Johannes</first><last>Sch&#228;fer</last></author>
    <author><first>Michael</first><last>Dorna</last></author>
    <author><first>Ulrich</first><last>Heid</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>41&#8211;51</pages>
    <url>http://aclweb.org/anthology/W16-4706</url>
    <abstract>The extraction of data exemplifying relations between terms can make use, at
	least to a large extent, of techniques that are similar to those used in
	standard hybrid term candidate extraction, namely basic corpus analysis tools
	(e.g. tagging, lemmatization, parsing), as well as morphological analysis of
	complex words (compounds and derived items). In this article, we discuss the
	use of such techniques for the extraction of raw material for a description of
	relations between terms, and we provide internal evaluation data for the
	devices developed.
	We claim that user-generated content is a rich source of term variation through
	paraphrasing and reformulation, and that these provide relational data at the
	same time as term variants. Germanic languages with their rich word formation
	morphology may be particularly good candidates for the approach advocated here.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>roesiger-EtAl:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4707">
    <title>Evaluation of distributional semantic models: a holistic approach</title>
    <author><first>Gabriel</first><last>Bernier-Colborne</last></author>
    <author><first>Patrick</first><last>Drouin</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>52&#8211;61</pages>
    <url>http://aclweb.org/anthology/W16-4707</url>
    <abstract>We investigate how both model-related factors and application-related factors
	affect the accuracy of distributional semantic models (DSMs) in the context of
	specialized lexicography, and how these factors interact. This holistic
	approach to the evaluation of DSMs provides valuable guidelines for the use of
	these models and insight into the kind of semantic information they capture.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>berniercolborne-drouin:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4708">
    <title>A Study on the Interplay Between the Corpus Size and Parameters of a Distributional Model for Term Classification</title>
    <author><first>Behrang</first><last>QasemiZadeh</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>62&#8211;72</pages>
    <url>http://aclweb.org/anthology/W16-4708</url>
    <abstract>We propose and evaluate a method for identifying co-hyponym lexical units in a
	terminological resource. The principles of term recognition and distributional
	semantics are combined to extract terms from a similar category of concept.
	Given a set of candidate terms, random projections are employed to represent
	them as low-dimensional vectors. These vectors are derived automatically from
	the frequency of the co-occurrences of the candidate terms and words that
	appear within windows of text in their proximity (context-windows). In a
	$k$-nearest neighbours framework, these vectors are classified using a small
	set of manually annotated terms which exemplify concept categories. We then
	investigate the interplay between the size of the corpus that is used for
	collecting the co-occurrences and a number of factors that play roles in the
	performance of the proposed method: the configuration of context-windows for
	collecting co-occurrences, the selection of neighbourhood size ($k$), and the
	choice of similarity metric.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>qasemizadeh:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4709">
    <title>Pattern-based Word Sketches for the Extraction of Semantic Relations</title>
    <author><first>Pilar</first><last>Le&#243;n-Ara&#250;z</last></author>
    <author><first>Antonio</first><last>San Mart&#237;n</last></author>
    <author><first>Pamela</first><last>Faber</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>73&#8211;82</pages>
    <url>http://aclweb.org/anthology/W16-4709</url>
    <abstract>Despite advances in computer technology, terminologists still tend to rely on
	manual work to extract all the semantic information that they need for the
	description of specialized concepts. In this paper we propose the creation of
	new word sketches in Sketch Engine for the extraction of semantic relations.
	Following a pattern-based approach, new sketch grammars are devel-oped in order
	to extract some of the most common semantic relations used in the field of
	ter-minology: generic-specific, part-whole, location, cause and function.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>leonarauz-sanmartin-faber:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4710">
    <title>Constructing and Evaluating Controlled Bilingual Terminologies</title>
    <author><first>Rei</first><last>Miyata</last></author>
    <author><first>Kyo</first><last>Kageura</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>83&#8211;93</pages>
    <url>http://aclweb.org/anthology/W16-4710</url>
    <abstract>This paper presents the construction and evaluation of Japanese and English
	controlled bilingual terminologies that are particularly intended for
	controlled authoring and machine translation with special reference to the
	Japanese municipal domain. Our terminologies are constructed by extracting
	terms from municipal website texts, and the term variations are controlled by
	defining preferred and proscribed terms for both the source Japanese and the
	target English. To assess the coverage of the terms/concepts in the municipal
	domain and validate the quality of the control, we employ a quantitative
	extrapolation method that estimates the potential vocabulary size. Using
	Large-Number-of-Rare-Event (LNRE) modelling, we compare two parameters: (1)
	uncontrolled and controlled and (2) Japanese and English. The results show that
	our terminologies currently cover about 45--65% of the terms and 50--65% of
	the concepts in the municipal domain, and are well controlled. The detailed
	analysis of growth patterns of terminologies also provides insight into the
	extent to which we can enlarge the terminologies within the realistic range.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>miyata-kageura:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4711">
    <title>Providing and Analyzing NLP Terms for our Community</title>
    <author><first>Gil</first><last>Francopoulo</last></author>
    <author><first>Joseph</first><last>Mariani</last></author>
    <author><first>Patrick</first><last>Paroubek</last></author>
    <author><first>Fr&#233;d&#233;ric</first><last>Vernier</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>94&#8211;103</pages>
    <url>http://aclweb.org/anthology/W16-4711</url>
    <abstract>By its own nature, the Natural Language Processing (NLP) community is a priori
	the best equipped to study the evolution of its own publications, but works in
	this direction are rare and only recently have we seen a few attempts at
	charting the field. In this paper, we use the algorithms, resources, standards,
	tools and common practices of the NLP field to build a list of terms
	characteristic of ongoing research, by mining a large corpus of scientific
	publications, aiming at the largest possible exhaustivity and covering the
	largest possible time span. Study of the evolution of this term list through
	time reveals interesting insights on the dynamics of field and the availability
	of the term database and of the corpus (for a large part) make possible many
	further comparative studies in addition to providing a test field for a new
	graphic interface designed to  perform visual  time analytics of large sized
	thesauri.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>francopoulo-EtAl:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4712">
    <title>Evaluating a dictionary of human phenotype terms focusing on rare diseases</title>
    <author><first>Simon</first><last>Kocbek</last></author>
    <author><first>Toyofumi</first><last>Fujiwara</last></author>
    <author><first>Jin-Dong</first><last>Kim</last></author>
    <author><first>Toshihisa</first><last>Takagi</last></author>
    <author><first>Tudor</first><last>Groza</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>104&#8211;109</pages>
    <url>http://aclweb.org/anthology/W16-4712</url>
    <abstract>Annotating medical text such as clinical notes with human phenotype descriptors
	is an important task that can, for example, assist in building patient
	profiles. To automatically annotate text one usually needs a dictionary of
	predefined terms. However, do to the variety of human expressiveness, current
	state-of-the art phenotype concept recognizers and automatic annotators
	struggle with specific domain issues and challenges. In this paper we present
	results of an-notating gold standard corpus with a dictionary containing
	lexical variants for the Human Phenotype Ontology terms. The main purpose of
	the dictionary is to improve the recall of phenotype concept recognition
	systems. We compare the method with four other approaches and present results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kocbek-EtAl:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4713">
    <title>A semi automatic annotation approach for ontological and terminological knowledge acquisition</title>
    <author><first>Driss</first><last>Sadoun</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>110&#8211;120</pages>
    <url>http://aclweb.org/anthology/W16-4713</url>
    <abstract>We propose a semi-automatic method for the acquisition of specialised
	ontological and terminological knowledge. An ontology and a terminology are
	automatically built from domain experts' annotations. The ontology formalizes
	the common and shared conceptual vocabulary of those experts. Its associated
	terminology defines a glossary linking annotated terms to their semantic
	categories. These two resources evolve incrementally and are used for an
	automatic annotation of a new corpus at each iteration. The annotated corpus
	concerns the evaluation of French higher education and science institutions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sadoun:2016:Computerm2016</bibkey>
  </paper>

  <paper id="4714">
    <title>Understanding Medical free text: A Terminology driven approach</title>
    <author><first>Santosh Sai</first><last>Krishna</last></author>
    <author><first>Manoj</first><last>Hans</last></author>
    <booktitle>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>121&#8211;125</pages>
    <url>http://aclweb.org/anthology/W16-4714</url>
    <abstract>With many hospitals digitalizing clinical records it has opened opportunities
	for researchers in NLP, Machine Learning to apply techniques for extracting
	meaning and make actionable insights. There has been previous attempts in
	mapping free text to medical nomenclature like UMLS, SNOMED. However, in this
	paper, we had analyzed diagnosis in clinical reports using ICD10 to achieve a
	lightweight, real-time predictions by introducing concepts like WordInfo, root
	word identification. We were able to achieve 68.3% accuracy over clinical
	records collected from qualified clinicians. Our study would further help the
	healthcare institutes in organizing their clinical reports based on ICD10
	mappings and derive numerous insights to achieve operational efficiency and
	better medical care.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>krishna-hans:2016:Computerm2016</bibkey>
  </paper>

</volume>

