<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W16">
  <paper id="5400">
    <title>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</title>
    <editor>Koiti Hasida</editor>
    <editor>Kam-Fai Wong</editor>
    <editor>Nicoletta Calzorari</editor>
    <editor>Key-Sun Choi</editor>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <url>http://aclweb.org/anthology/W16-54</url>
    <bibtype>book</bibtype>
    <bibkey>ALR12:2016</bibkey>
  </paper>

  <paper id="5401">
    <title>An extension of ISO-Space for annotating object direction</title>
    <author><first>Daiki</first><last>Gotou</last></author>
    <author><first>Hitoshi</first><last>Nishikawa</last></author>
    <author><first>Takenobu</first><last>Tokunaga</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>1&#8211;9</pages>
    <url>http://aclweb.org/anthology/W16-5401</url>
    <abstract>In this paper, we extend an existing annotation scheme ISO-Space for annotating
	necessary spatial information for the task placing an specified object at a
	specified location with a specified direction according to a natural language
	instruction. We call such task the spatial placement problem. Our extension
	particularly focuses on describing the object direction, when the object is
	placed on the 2D plane. We conducted an annotation experiment in which a corpus
	of 20 situated dialogues were annotated. The annotation result showed the
	number of newly introduced tags by our proposal is not negligible. We also
	implemented an analyser that automatically assigns the proposed tags to the
	corpus and evaluated its performance. The result showed that the performance
	for entity tag was quite high ranging from 0.68 to 0.99 in F-measure, but not
	the case for relation tags, i.e. less than 0.4 in F-measure.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gotou-nishikawa-tokunaga:2016:ALR12</bibkey>
  </paper>

  <paper id="5402">
    <title>Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts</title>
    <author><first>Kimi</first><last>Kaneko</last></author>
    <author><first>Saku</first><last>Sugawara</last></author>
    <author><first>Koji</first><last>Mineshima</last></author>
    <author><first>Daisuke</first><last>Bekki</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>10&#8211;19</pages>
    <url>http://aclweb.org/anthology/W16-5402</url>
    <abstract>This paper proposes a methodology for building a specialized Japanese data set
	for recognizing temporal relations and discourse relations. 
	In addition to temporal and discourse relations, multi-layered situational
	relations
	that distinguish generic and specific states belonging to different layers in a
	discourse are annotated.
	Our methodology has been applied to 170 text fragments taken from Wikinews
	articles in Japanese.
	The validity of our methodology is evaluated and analyzed
	in terms of degree of annotator agreement and frequency of errors.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kaneko-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5403">
    <title>Developing Universal Dependencies for Mandarin Chinese</title>
    <author><first>Herman</first><last>Leung</last></author>
    <author><first>Rafa&#235;l</first><last>Poiret</last></author>
    <author><first>Tak-sum</first><last>Wong</last></author>
    <author><first>Xinying</first><last>Chen</last></author>
    <author><first>Kim</first><last>Gerdes</last></author>
    <author><first>John</first><last>Lee</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>20&#8211;29</pages>
    <url>http://aclweb.org/anthology/W16-5403</url>
    <abstract>This article proposes a Universal Dependency Annotation Scheme for Mandarin
	Chinese, including  POS tags and dependency analysis. We identify cases of
	idiosyncrasy of Mandarin Chinese that are difficult to fit into the current
	schema which has mainly been based on the descriptions of various Indo-European
	languages. We discuss differences between our scheme and those of the Stanford
	Chinese Dependencies and the Chinese Dependency Treebank.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>leung-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5404">
    <title>Developing Corpus of Lecture Utterances Aligned to Slide Components</title>
    <author><first>Ryo</first><last>Minamiguchi</last></author>
    <author><first>Masatoshi</first><last>Tsuchiya</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>30&#8211;37</pages>
    <url>http://aclweb.org/anthology/W16-5404</url>
    <abstract>The approach which formulates the automatic text summarization as a maximum
	coverage problem with knapsack constraint over a set of textual units and a set
	of weighted conceptual units is promising. However, it is quite important and
	difficult to determine the appropriate granularity of conceptual units for this
	formulation. In order to resolve this problem, we are examining to use
	components of presentation slides as conceptual units to generate a summary of
	lecture utterances, instead of other possible conceptual units like base noun
	phrases or important nouns. This paper explains our developing corpus designed
	to evaluate our proposing approach, which consists of presentation slides and
	lecture utterances aligned to presentation slide components.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>minamiguchi-tsuchiya:2016:ALR12</bibkey>
  </paper>

  <paper id="5405">
    <title>VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization</title>
    <author><first>Minh-Tien</first><last>Nguyen</last></author>
    <author><first>Dac Viet</first><last>Lai</last></author>
    <author><first>Phong-Khac</first><last>Do</last></author>
    <author><first>Duc-Vu</first><last>Tran</last></author>
    <author><first>Minh-Le</first><last>Nguyen</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>38&#8211;48</pages>
    <url>http://aclweb.org/anthology/W16-5405</url>
    <abstract>This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset,
	which was manually created to treat the lack of standard corpora for social
	context summarization in
	Vietnamese. The dataset was collected through the keywords of 141 Web documents
	in 12 special events, which were mentioned on Vietnamese Web pages. Social
	users were asked to involve in creating standard summaries and the label of
	each sentence or comment. The inter-agreement calculated by Cohen's Kappa among
	raters after validating is 0.685. To illustrate the potential use of our
	dataset, a learning to rank method was trained by using a set of local and
	social features. Experimental results indicate that the summary model trained
	on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and
	ROUGE-2 in social context summarization.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nguyen-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5406">
    <title>BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of Contemporary Written Japanese’</title>
    <author><first>Masayuki</first><last>Asahara</last></author>
    <author><first>Yuji</first><last>Matsumoto</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>49&#8211;58</pages>
    <url>http://aclweb.org/anthology/W16-5406</url>
    <abstract>Paratactic syntactic structures are difficult to represent in syntactic
	dependency tree structures.
	As such, we propose an annotation schema for syntactic dependency annotation of
	Japanese,
	in which coordinate structures are split from and overlaid on bunsetsu-based
	(base phrase unit)
	dependency. The schema represents nested coordinate structures, non-constituent
	conjuncts, and
	forward sharing as the set of regions. The annotation was performed on the core
	data of ‘Balanced
	Corpus of Contemporary Written Japanese’, which comprised about one million
	words and 1980
	samples from six registers, such as newspapers, books, magazines, and web
	texts.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>asahara-matsumoto:2016:ALR12</bibkey>
  </paper>

  <paper id="5407">
    <title>SCTB: A Chinese Treebank in Scientific Domain</title>
    <author><first>Chenhui</first><last>Chu</last></author>
    <author><first>Toshiaki</first><last>Nakazawa</last></author>
    <author><first>Daisuke</first><last>Kawahara</last></author>
    <author><first>Sadao</first><last>Kurohashi</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>59&#8211;67</pages>
    <url>http://aclweb.org/anthology/W16-5407</url>
    <abstract>Treebanks are curial for natural language processing (NLP). In this paper, we
	present our work for annotating a Chinese treebank in scientific domain (SCTB),
	to address the problem of the lack of Chinese treebanks in this domain. Chinese
	analysis and machine translation experiments conducted using this treebank
	indicate that the annotated treebank can significantly improve the performance
	on both tasks. This treebank is released to promote Chinese NLP research in
	scientific domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chu-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5408">
    <title>Big Community Data before World Wide Web Era</title>
    <author><first>Tomoya</first><last>Iwakura</last></author>
    <author><first>Tetsuro</first><last>Takahashi</last></author>
    <author><first>Akihiro</first><last>Ohtani</last></author>
    <author><first>Kunio</first><last>Matsui</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>68&#8211;72</pages>
    <url>http://aclweb.org/anthology/W16-5408</url>
    <abstract>This paper introduces the NIFTY-Serve corpus, a large data archive collected
	from Japanese discussion forums that operated via a Bulletin Board System (BBS)
	between 1987 and 2006. This corpus can be used in Artificial Intelligence
	researches such as Natural Language Processing, Community Analysis, and so on.
	The NIFTY-Serve corpus differs from data on WWW in three ways; (1) essentially
	spam- and duplication-free because of strict data collection procedures, (2)
	historic user-generated data before WWW, and (3) a complete data set because
	the service now shut down. We also introduce some examples of use of the
	corpus.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>iwakura-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5409">
    <title>An Overview of BPPT's Indonesian Language Resources</title>
    <author><first>Gunarso</first><last>Gunarso</last></author>
    <author><first>Hammam</first><last>Riza</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>73&#8211;77</pages>
    <url>http://aclweb.org/anthology/W16-5409</url>
    <abstract>This paper describes various Indonesian language resources that Agency for the
	Assessment and Application of Technology (BPPT) has developed and collected
	since mid 80’s when we joined MMTS (Multilingual Machine Translation System),
	an international project coordinated by CICC-Japan to develop a machine
	translation system for five Asian languages (Bahasa Indonesia, Malay, Thai,
	Japanese, and Chinese). Since then, we have been actively doing many types of
	research in the field of statistical machine translation, speech recognition,
	and speech synthesis which requires many text and speech corpus. Most recent
	cooperation within ASEAN-IVO is the development of Indonesian ALT (Asian
	Language Treebank) has added new NLP tools.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gunarso-riza:2016:ALR12</bibkey>
  </paper>

  <paper id="5410">
    <title>Creating Japanese Political Corpus from Local Assembly Minutes of 47 prefectures</title>
    <author><first>Yasutomo</first><last>Kimura</last></author>
    <author><first>Keiichi</first><last>Takamaru</last></author>
    <author><first>Takuma</first><last>Tanaka</last></author>
    <author><first>Akio</first><last>Kobayashi</last></author>
    <author><first>Hiroki</first><last>Sakaji</last></author>
    <author><first>Yuzu</first><last>Uchida</last></author>
    <author><first>Hokuto</first><last>Ototake</last></author>
    <author><first>Shigeru</first><last>Masuyama</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>78&#8211;85</pages>
    <url>http://aclweb.org/anthology/W16-5410</url>
    <abstract>This paper describes a Japanese political corpus created for interdisciplinary
	political research.
	The corpus contains the local assembly minutes of 47 prefectures from April
	2011 to March 2015.
	This four-year period coincides with the term of office for assembly members in
	most autonomies.
	We analyze statistical data, such as the number of speakers, characters, and
	words, to clarify the characteristics of local assembly minutes.
	In addition, we identify problems associated with the different web services
	used by the autonomies to make the minutes available to the public.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kimura-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5411">
    <title>Selective Annotation of Sentence Parts: Identification of Relevant Sub-sentential Units</title>
    <author><first>Ge</first><last>Xu</last></author>
    <author><first>Xiaoyan</first><last>Yang</last></author>
    <author><first>Chu-Ren</first><last>Huang</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>86&#8211;94</pages>
    <url>http://aclweb.org/anthology/W16-5411</url>
    <abstract>Many NLP tasks involve sentence-level annotation yet the relevant information
	is not encoded at sentence level but at some relevant parts of the sentence.
	Such tasks include but are not limited to: sentiment expression annotation,
	product feature annotation, and template annotation for Q&#38;A systems. However,
	annotation of the full corpus sentence by sentence is resource intensive. In
	this paper, we propose an approach that iteratively extracts frequent parts of
	sentences for annotating, and compresses the set of sentences after each round
	of annotation. Our approach can also be used in preparing training sentences
	for binary classification (domain-related vs. noise, subjectivity vs.
	objectivity, etc.), assuming that sentence-type annotation can be predicted by
	annotation of the most relevant sub-sentences. Two experiments are performed to
	test our proposal and evaluated in terms of time saved and agreement of
	annotation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>xu-yang-huang:2016:ALR12</bibkey>
  </paper>

  <paper id="5412">
    <title>The Kyutech corpus and topic segmentation using a combined method</title>
    <author><first>Takashi</first><last>Yamamura</last></author>
    <author><first>Kazutaka</first><last>Shimada</last></author>
    <author><first>Shintaro</first><last>Kawahara</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>95&#8211;104</pages>
    <url>http://aclweb.org/anthology/W16-5412</url>
    <abstract>Summarization of multi-party conversation is one of the important tasks in
	natural language processing.
	In this paper, we explain a Japanese corpus and a topic segmentation task.
	To the best of our knowledge, the corpus is the first Japanese corpus annotated
	for summarization tasks and freely available to anyone.
	We call it ``the Kyutech corpus.''
	The task of the corpus is a decision-making task with four participants and it
	contains utterances with time information, topic segmentation and reference
	summaries.
	As a case study for the corpus, we describe a method combined with LCSeg and
	TopicTiling for a topic segmentation task.
	We discuss the effectiveness and the problems of the combined method through
	the experiment with the Kyutech corpus.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yamamura-shimada-kawahara:2016:ALR12</bibkey>
  </paper>

  <paper id="5413">
    <title>Automatic Evaluation of Commonsense Knowledge for Refining Japanese ConceptNet</title>
    <author><first>Seiya</first><last>Shudo</last></author>
    <author><first>Rafal</first><last>Rzepka</last></author>
    <author><first>Kenji</first><last>Araki</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>105&#8211;112</pages>
    <url>http://aclweb.org/anthology/W16-5413</url>
    <abstract>In this paper we present two methods for automatic common sense knowledge
	evaluation for Japanese entries in ConceptNet ontology. Our proposed methods
	utilize text-mining approach: one with relation clue words and WordNet
	synonyms, and one without. Both methods were tested with a blog corpus. The
	system based on our proposed methods reached relatively high precision score
	for three relations (MadeOf, UsedFor, AtLocation), which is comparable with
	previous research using commercial search engines and simpler input. We analyze
	errors and discuss problems of common sense evaluation, both manual and
	automatic and propose ideas for further improvements.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shudo-rzepka-araki:2016:ALR12</bibkey>
  </paper>

  <paper id="5414">
    <title>SAMER: A Semi-Automatically Created Lexical Resource for Arabic Verbal Multiword Expressions Tokens Paradigm and their Morphosyntactic Features</title>
    <author><first>Mohamed</first><last>Al-Badrashiny</last></author>
    <author><first>Abdelati</first><last>Hawwari</last></author>
    <author><first>Mahmoud</first><last>Ghoneim</last></author>
    <author><first>Mona</first><last>Diab</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>113&#8211;122</pages>
    <url>http://aclweb.org/anthology/W16-5414</url>
    <abstract>Although MWE are relatively morphologically and syntactically fixed
	expressions, several types
	of flexibility can be observed in MWE, verbal MWE in particular. Identifying
	the degree of
	morphological and syntactic flexibility of MWE is very important for many
	Lexicographic and
	NLP tasks. Adding MWE variants/tokens to a dictionary resource requires
	characterizing the
	flexibility among other morphosyntactic features. Carrying out the task
	manually faces several
	challenges since it is a very laborious task time and effort wise, as well as
	it will suffer from
	coverage limitation. The problem is exacerbated in rich morphological languages
	where the
	average word in Arabic could have 12 possible inflection forms. Accordingly, in
	this paper we
	introduce a semi-automatic Arabic multiwords expressions resource (SAMER). We
	propose an
	automated method that identifies the morphological and syntactic flexibility of
	Arabic Verbal
	Multiword Expressions (AVMWE). All observed morphological variants and
	syntactic pattern
	alternations of an AVMWE are automatically acquired using large scale corpora.
	We look for three
	morphosyntactic aspects of AVMWE types investigating derivational and
	inflectional variations
	and syntactic templates, namely: 1) inflectional variation (inflectional
	paradigm) and calculating
	degree of flexibility; 2) derivational productivity; and 3) identifying and
	classifying the different
	syntactic types. We build a comprehensive list of AVMWE. Every token in the
	AVMWE list is
	lemmatized and tagged with POS information. We then search Arabic Gigaword and
	All ATBs
	for all possible flexible matches. For each AVMWE type we generate: a) a
	statistically ranked list
	of MWE-lexeme inflections and syntactic pattern alternations; b) An abstract
	syntactic template;
	and c) The most frequent form. Our technique is validated using a Golden MWE
	annotated list.
	The results shows that the quality of the generated resource is 80.04%.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>albadrashiny-EtAl:2016:ALR12</bibkey>
  </paper>

  <paper id="5415">
    <title>Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets</title>
    <author><first>Tuan Anh</first><last>Le</last></author>
    <author><first>David</first><last>Moeljadi</last></author>
    <author><first>Yasuhide</first><last>Miura</last></author>
    <author><first>Tomoko</first><last>Ohkuma</last></author>
    <booktitle>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>123&#8211;131</pages>
    <url>http://aclweb.org/anthology/W16-5415</url>
    <abstract>This paper describes our attempt to build a sentiment analysis system for
	Indonesian tweets. With this system, we can study and identify sentiments and
	opinions in a text or document computationally. We used four thousand manually
	labeled tweets collected in February and March 2016 to build the model. Because
	of the variety of content in tweets, we analyze tweets into eight groups in
	total, including pos(itive), neg(ative), and neu(tral). Finally, we obtained
	73.2% accuracy with Long Short Term Memory (LSTM) without normalizer.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>le-EtAl:2016:ALR12</bibkey>
  </paper>

</volume>

