<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W16">
  <paper id="5200">
    <title>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</title>
    <editor>Yohei Murakami</editor>
    <editor>Donghui Lin</editor>
    <editor>Nancy Ide</editor>
    <editor>James Pustejovsky</editor>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <url>http://aclweb.org/anthology/W16-52</url>
    <bibtype>book</bibtype>
    <bibkey>WLSI-OIAF4HLT2016:2016</bibkey>
  </paper>

  <paper id="5201">
    <title>Kathaa : NLP Systems as Edge-Labeled Directed Acyclic MultiGraphs</title>
    <author><first>Sharada</first><last>Mohanty</last></author>
    <author><first>Nehal J</first><last>Wani</last></author>
    <author><first>Manish</first><last>Srivastava</last></author>
    <author><first>Dipti</first><last>Sharma</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>1&#8211;10</pages>
    <url>http://aclweb.org/anthology/W16-5201</url>
    <abstract>We present Kathaa, an Open Source web-based Visual Programming Framework for
	Natural Language Processing (NLP) Systems. Kathaa supports the design,
	execution and analysis of complex NLP systems by visually connecting NLP
	components from an easily extensible Module Library. It models NLP systems an
	edge-labeled Directed Acyclic MultiGraph, and lets the user use publicly
	co-created modules in their own NLP applications irrespective of their
	technical proficiency in Natural Language Processing. Kathaa exposes an
	intuitive web based Interface for the users to interact with and modify complex
	NLP Systems; and a precise Module definition API to allow easy integration of
	new state of the art NLP components. Kathaa enables researchers to publish
	their services in a standardized format to enable the masses to use their
	services out of the box. The vision of this work is to pave the way for a
	system like Kathaa, to be the Lego blocks of NLP Research and Applications. As
	a practical use case we use Kathaa to visually implement the Sampark
	Hindi-Panjabi Machine Translation Pipeline and the Sampark Hindi-Urdu Machine
	Translation Pipeline, to demonstrate the fact that Kathaa can handle really
	complex NLP systems while still being intuitive for the end user.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mohanty-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5202">
    <title>LAPPS/Galaxy: Current State and Next Steps</title>
    <author><first>Nancy</first><last>Ide</last></author>
    <author><first>Keith</first><last>Suderman</last></author>
    <author><first>Eric</first><last>Nyberg</last></author>
    <author><first>James</first><last>Pustejovsky</last></author>
    <author><first>Marc</first><last>Verhagen</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>11&#8211;18</pages>
    <url>http://aclweb.org/anthology/W16-5202</url>
    <abstract>The US National Science Foundation (NSF) SI2-funded LAPPS/Galaxy project has
	developed an open-source platform for enabling complex analyses while hiding
	complexities associated with underlying infrastructure, that can be accessed
	through a web interface, deployed on any Unix system, or run from the cloud. It
	provides sophisticated tool integration and history capabili- ties, a workflow
	system for building automated multi-step analyses, state-of-the-art evaluation
	capabilities, and facilities for sharing and publishing analyses. This paper
	describes the current facilities available in LAPPS/Galaxy and outlines the
	project’s ongoing activities to enhance the framework.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ide-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5203">
    <title>Automatic Analysis of Flaws in Pre-Trained NLP Models</title>
    <author><first>Richard</first><last>Eckart de Castilho</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>19&#8211;27</pages>
    <url>http://aclweb.org/anthology/W16-5203</url>
    <abstract>Most tools for natural language processing today are based on machine learning
	and come with pre-trained models. In addition, third-parties provide
	pre-trained models for popular NLP tools. The predictive power and accuracy of
	these tools depends on the quality of these models. Downstream researchers
	often base their results on pre-trained models instead of training their own.
	Consequently, pre-trained models are an essential resource to our community.
	However, to be best of our knowledge, no systematic study of pre-trained models
	has been conducted so far.
	This paper reports on the analysis of 274 pre-models for six NLP tools and four
	potential causes of problems: encoding, tokenization, normalization and change
	over time. The analysis is implemented in the open source tool Model
	Investigator. Our work 1) allows model consumers to better assess whether a
	model is suitable for their task, 2) enables tool and model creators to
	sanity-check their models before distributing them, and 3) enables improvements
	in tool interoperability by performing automatic adjustments of normalization
	or other pre-processing based on the models used.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>eckartdecastilho:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5204">
    <title>Combining Human Inputters and Language Services to provide Multi-language support system for International Symposiums</title>
    <author><first>Takao</first><last>Nakaguchi</last></author>
    <author><first>Masayuki</first><last>Otani</last></author>
    <author><first>Toshiyuki</first><last>Takasaki</last></author>
    <author><first>Toru</first><last>Ishida</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>28&#8211;35</pages>
    <url>http://aclweb.org/anthology/W16-5204</url>
    <abstract>In this research, we introduce and implement a method that combines human
	inputters and machine translators. When the languages of the participants vary
	widely, the cost of simultaneous translation becomes very high. However, the
	results of simply applying machine translation to speech text do not have the
	quality that is needed for real use.
	Thus, we propose a method that people who understand the language of the
	speaker cooperate with a machine translation service in support of
	multilingualization by the co-creation of value.
	We implement a system with this method and apply it to actual presentations.
	While the quality of direct machine translations is 1.84 (fluency) and 2.89
	(adequacy), the system has corresponding values of 3.76 and 3.85.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nakaguchi-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5205">
    <title>Recurrent Neural Network with Word Embedding for Complaint Classification</title>
    <author><first>panuwat</first><last>assawinjaipetch</last></author>
    <author><first>Kiyoaki</first><last>Shirai</last></author>
    <author><first>Virach</first><last>Sornlertlamvanich</last></author>
    <author><first>Sanparith</first><last>Marukata</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>36&#8211;43</pages>
    <url>http://aclweb.org/anthology/W16-5205</url>
    <abstract>Complaint classification aims at using information to deliver greater insights
	to enhance user experience after purchasing the products or services.
	Categorized information can help us quickly collect emerging problems in order
	to provide a support needed. Indeed, the response to the complaint without the
	delay will grant users highest satisfaction. In this paper, we aim to deliver a
	novel approach which can clarify the complaints precisely with the aim to
	classify each complaint into nine predefined classes i.e. acces-sibility,
	company brand, competitors, facilities, process, product feature, staff
	quality, timing respec-tively and others. Given the idea that one word usually
	conveys ambiguity and it has to be interpreted by its context, the word
	embedding technique is used to provide word features while applying deep
	learning techniques for classifying a type of complaints. The dataset we use
	contains 8,439 complaints of one company.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>assawinjaipetch-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5206">
    <title>Universal dependencies for Uyghur</title>
    <author><first>marhaba</first><last>eli</last></author>
    <author><first>Weinila</first><last>Mushajiang</last></author>
    <author><first>Tuergen</first><last>Yibulayin</last></author>
    <author><first>Kahaerjiang</first><last>Abiderexiti</last></author>
    <author><first>Yan</first><last>Liu</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>44&#8211;50</pages>
    <url>http://aclweb.org/anthology/W16-5206</url>
    <abstract>The Universal Dependencies (UD) Project seeks to build a cross-lingual studies
	of treebanks, linguistic structures and parsing. Its goal is to create a set of
	multilingual harmonized treebanks that are designed according to a universal
	annotation scheme. In this paper, we report on the conversion of the Uyghur
	dependency treebank to a UD version of the treebank which we term the Uyghur
	Universal Dependency Treebank (UyDT). We present the mapping of the Uyghur
	dependency treebank’s labelling scheme to the UD scheme, along with a clear
	description of the structural changes required in this conversion.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>eli-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5207">
    <title>A non-expert Kaldi recipe for Vietnamese Speech Recognition System</title>
    <author><first>Hieu-Thi</first><last>Luong</last></author>
    <author><first>Hai-Quan</first><last>Vu</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>51&#8211;55</pages>
    <url>http://aclweb.org/anthology/W16-5207</url>
    <abstract>In this paper we describe a non-expert setup for Vietnamese speech recognition
	system using Kaldi toolkit. We collected a speech corpus over fifteen hours
	from about fifty Vietnamese native speakers and using it to test the
	feasibility
	of our setup. The essential linguistic components for the Automatic Speech
	Recognition (ASR) system was prepared basing on the written form of the
	language instead of expertise knowledge on linguistic and phonology as commonly
	seen in rich resource languages like English. The modeling of tones by
	integrating them into the phoneme and using the phonetic decision tree is also
	discussed. Experimental results showed this setup for ASR systems does yield
	competitive results while still have potentials for further improvements.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>luong-vu:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5208">
    <title>Evaluating Ensemble Based Pre-annotation on Named Entity Corpus Construction in English and Chinese</title>
    <author><first>Tingming</first><last>Lu</last></author>
    <author><first>Man</first><last>Zhu</last></author>
    <author><first>Zhiqiang</first><last>Gao</last></author>
    <author><first>Yaocheng</first><last>Gui</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>56&#8211;60</pages>
    <url>http://aclweb.org/anthology/W16-5208</url>
    <abstract>Annotated corpora are crucial language resources, and pre-annotation is an
	usual way to reduce the cost of corpus construction. Ensemble based
	pre-annotation approach combines multiple existing named entity taggers and
	categorizes annotations into normal annotations with high confidence and
	candidate annotations with low confidence, to reduce the human annotation time.
	In this paper, we manually annotate three English datasets under various
	pre-annotation conditions, report the effects of ensemble based pre-annotation,
	and analyze the experimental results. In order to verify the effectiveness of
	ensemble based pre-annotation in other languages, such as Chinese, three
	Chinese datasets are also tested. The experimental results show that the
	ensemble based pre-annotation approach significantly reduces the number of
	annotations which human annotators have to add, and outperforms the baseline
	approaches in reduction of human annotation time without loss in annotation
	performance (in terms of F1-measure), on both English and Chinese datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lu-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5209">
    <title>An Ontology for Language Service Composability</title>
    <author><first>Yohei</first><last>Murakami</last></author>
    <author><first>Takao</first><last>Nakaguchi</last></author>
    <author><first>Donghui</first><last>Lin</last></author>
    <author><first>Toru</first><last>Ishida</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>61&#8211;69</pages>
    <url>http://aclweb.org/anthology/W16-5209</url>
    <abstract>Fragmentation and recombination is a key to create customized language
	environments for supporting various intercultural activities. Fragmentation
	provides various language resource components for the customized language
	environments and recombination builds each language environment according to
	user's request by combining these components. To realize this fragmentation and
	recombination process, existing language resources (both data and programs)
	should be shared as language services and combined beyond mismatch of their
	service interfaces. To address this issue, standardization is inevitable:
	standardized interfaces are necessary for language services as well as data
	format required for language resources. Therefore, we have constructed a
	hierarchy of language services based on inheritance of service interfaces,
	which is called language service ontology. This ontology allows users to create
	a new customized language service that is compatible with existing ones.
	Moreover, we have developed a dynamic service binding technology that
	instantiates various executable customized services from an abstract workflow
	according to user's request. By using the ontology and service binding
	together, users can bind the instantiated language service to another abstract
	workflow for a new customized one.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>murakami-EtAl:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

  <paper id="5210">
    <title>Between Platform and APIs: Kachako API for Developers</title>
    <author><first>Yoshinobu</first><last>Kano</last></author>
    <booktitle>Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>70&#8211;75</pages>
    <url>http://aclweb.org/anthology/W16-5210</url>
    <abstract>Different types of users require different functions in NLP software. It is
	difficult for a single platform to cover all types of users. When a framework
	aims to provide more interoperability, users are required to learn more
	concepts; users' application designs are restricted to be compliant with the
	framework. While an interoperability framework is useful in certain cases, some
	types of users will not select the framework due to the learning cost and
	design restrictions. We suggest a rather simple framework for the
	interoperability aiming at developers. Reusing an existing NLP platform
	Kachako, we created an API
	oriented NLP system. This system loosely couples rich high-end functions,
	including annotation visualizations, statistical evaluations, an-notation
	searching, etc. This API do not require users much learning cost, providing
	customization ability for power users while also allowing easy users to employ
	many GUI functions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kano:2016:WLSI-OIAF4HLT2016</bibkey>
  </paper>

</volume>

