<?xml version="1.0" encoding="UTF-8" ?>
<volume id="I17">
  <paper id="1000">
    <title>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</title>
    <editor>Greg Kondrak</editor>
    <editor>Taro Watanabe</editor>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <url>http://www.aclweb.org/anthology/I17-1</url>
    <bibtype>book</bibtype>
    <bibkey>I17-1:2017</bibkey>
  </paper>

  <paper id="1001">
    <title>Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks</title>
    <author><first>Yonatan</first><last>Belinkov</last></author>
    <author><first>Llu&#237;s</first><last>M&#224;rquez</last></author>
    <author><first>Hassan</first><last>Sajjad</last></author>
    <author><first>Nadir</first><last>Durrani</last></author>
    <author><first>Fahim</first><last>Dalvi</last></author>
    <author><first>James</first><last>Glass</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>1&#8211;10</pages>
    <url>http://www.aclweb.org/anthology/I17-1001</url>
    <abstract>While neural machine translation (NMT) models provide improved translation
	quality in an elegant framework, it is less clear what they learn about
	language. Recent work has started evaluating the quality of vector
	representations learned by NMT models on morphological and syntactic tasks. In
	this paper, we investigate the representations learned at different layers of
	NMT encoders. We train NMT systems on parallel data and use the models to
	extract features for training a classifier on two tasks: part-of-speech and
	semantic tagging. We then measure the performance of the classifier as a proxy
	to the quality of the original NMT model for the given task. Our quantitative
	analysis yields interesting insights regarding representation learning in NMT
	models. For instance, we find that higher layers are better at learning
	semantics while lower layers tend to be better for part-of-speech tagging. We
	also observe little effect of the target language on source-side
	representations, especially in higher quality models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>belinkov-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1002">
    <title>Context-Aware Smoothing for Neural Machine Translation</title>
    <author><first>Kehai</first><last>Chen</last></author>
    <author><first>Rui</first><last>Wang</last></author>
    <author><first>Masao</first><last>Utiyama</last></author>
    <author><first>Eiichiro</first><last>Sumita</last></author>
    <author><first>Tiejun</first><last>Zhao</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>11&#8211;20</pages>
    <url>http://www.aclweb.org/anthology/I17-1002</url>
    <abstract>In Neural Machine Translation (NMT), each word is represented as a
	low-dimension, real-value vector for encoding its syntax and semantic
	information. This means that even if the word is in a different sentence
	context, it is represented as the fixed vector to learn source representation.
	Moreover, a large number of Out-Of-Vocabulary (OOV) words, which have different
	syntax and semantic information, are represented as the same vector
	representation of "unk". To alleviate this problem, we propose a novel
	context-aware smoothing method to dynamically learn a sentence-specific vector
	for each word (including OOV words) depending on its local context words in a
	sentence.  The learned context-aware representation is integrated into the NMT
	to improve the translation performance. Empirical results on NIST
	Chinese-to-English translation task show that the proposed approach achieves
	1.78 BLEU improvements on average over a strong attentional NMT, and
	outperforms some existing systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1003">
    <title>Improving Sequence to Sequence Neural Machine Translation by Utilizing Syntactic Dependency Information</title>
    <author><first>An</first><last>Nguyen Le</last></author>
    <author><first>Ander</first><last>Martinez</last></author>
    <author><first>Akifumi</first><last>Yoshimoto</last></author>
    <author><first>Yuji</first><last>Matsumoto</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>21&#8211;29</pages>
    <url>http://www.aclweb.org/anthology/I17-1003</url>
    <dataset>I17-1003.Datasets.zip</dataset>
    <abstract>Sequence to Sequence Neural Machine Translation has achieved significant
	performance in recent years.  Yet, there are some existing issues that Neural
	Machine Translation still does not solve completely. Two of them are
	translation for long sentences and the over-translation. To address
	these two problems, we propose an approach that utilize more grammatical
	information such as syntactic dependencies, so that the output can be generated
	based on more abundant information. In our approach, syntactic dependencies is
	employed in decoding. In addition, the output of the model is presented not as
	a simple sequence of tokens but as a linearized tree construction. In order to
	assess the performance, we construct model based on an attention mechanism
	encoder-decoder model in which the source language is input to the encoder as a
	sequence and the decoder generates the target language as a linearized
	dependency tree structure. Experiments on the Europarl-v7 dataset of
	French-to-English translation demonstrate that our proposed method improves
	BLEU scores by 1.57 and 2.40 on datasets consisting of sentences with up to 50
	and 80 tokens, respectively. Furthermore, the proposed method also solved the
	two existing problems, ineffective translation for long sentences and
	over-translation in Neural Machine Translation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nguyenle-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1004">
    <title>What does Attention in Neural Machine Translation Pay Attention to?</title>
    <author><first>Hamidreza</first><last>Ghader</last></author>
    <author><first>Christof</first><last>Monz</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>30&#8211;39</pages>
    <url>http://www.aclweb.org/anthology/I17-1004</url>
    <abstract>Attention in neural machine translation provides the possibility to encode
	relevant parts of the source sentence at each translation step. As a result,
	attention is considered to be an alignment model as well. However, there is no
	work that specifically studies attention and provides analysis of what is being
	learned by attention models. Thus, the question still remains that how
	attention is similar or different from the traditional alignment. In this
	paper, we provide detailed analysis of attention and compare it to traditional
	alignment. We answer the question of whether attention is only capable of
	modelling translational equivalent or it captures more information. We show
	that attention is different from alignment in some cases and is capturing
	useful information other than alignments.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ghader-monz:2017:I17-1</bibkey>
  </paper>

  <paper id="1005">
    <title>Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings</title>
    <author><first>Masahiro</first><last>Kaneko</last></author>
    <author><first>Yuya</first><last>Sakaizawa</last></author>
    <author><first>Mamoru</first><last>Komachi</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>40&#8211;48</pages>
    <url>http://www.aclweb.org/anthology/I17-1005</url>
    <abstract>In this study, we improve grammatical error detection by learning word
	embeddings that consider grammaticality and error patterns.
	Most existing algorithms for learning word embeddings usually model only the
	syntactic context of words so that classifiers treat erroneous and correct
	words as similar inputs.
	We address the problem of contextual information by considering learner errors.
	Specifically, we propose two models: one model that employs grammatical error
	patterns and another model that considers grammaticality of the target word.
	We determine grammaticality of n-gram sequence from the annotated error tags
	and extract grammatical error patterns for word embeddings from large-scale
	learner corpora.
	Experimental results show that a bidirectional long-short term memory model
	initialized by our word embeddings achieved the state-of-the-art accuracy by a
	large margin in an English grammatical error detection task on the First
	Certificate in English dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kaneko-sakaizawa-komachi:2017:I17-1</bibkey>
  </paper>

  <paper id="1006">
    <title>Dependency Parsing with Partial Annotations: An Empirical Comparison</title>
    <author><first>Yue</first><last>Zhang</last></author>
    <author><first>Zhenghua</first><last>Li</last></author>
    <author><first>Jun</first><last>Lang</last></author>
    <author><first>Qingrong</first><last>Xia</last></author>
    <author><first>Min</first><last>Zhang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>49&#8211;58</pages>
    <url>http://www.aclweb.org/anthology/I17-1006</url>
    <abstract>This paper describes and compares two straightforward approaches for dependency
	parsing with partial annotations (PA). The first approach is based on a
	forest-based training objective for two CRF parsers, i.e., a biaffine neural
	network graph-based parser (Biaffine) and a traditional log-linear graph-based
	parser (LLGPar). The second approach is based on the idea of constrained
	decoding for three parsers, i.e., a traditional linear graph-based parser
	(LGPar), a globally normalized neural network transition-based parser (GN3Par)
	and a traditional linear transition-based parser (LTPar). For the test phase,
	constrained decoding is also used for completing partial trees. We conduct
	experiments on Penn Treebank under three different settings for simulating PA,
	i.e., random, most uncertain, and divergent outputs from the five parsers. The
	results show that LLGPar is most effective in directly learning from PA, and
	other parsers can achieve best performance when PAs are completed into full
	trees by LLGPar.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-EtAl:2017:I17-11</bibkey>
  </paper>

  <paper id="1007">
    <title>Neural Probabilistic Model for Non-projective MST Parsing</title>
    <author><first>Xuezhe</first><last>Ma</last></author>
    <author><first>Eduard</first><last>Hovy</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>59&#8211;69</pages>
    <url>http://www.aclweb.org/anthology/I17-1007</url>
    <abstract>In this paper, we propose a probabilistic parsing model that defines a proper
	conditional probability distribution over non-projective
	dependency trees for a given sentence, using neural representations as inputs.
	The neural network architecture is based on bi-directional LSTMCNNs,
	which automatically benefits from both word- and character-level
	representations, by using a combination of bidirectional LSTMs and CNNs. On top
	of the neural network, we introduce a probabilistic structured layer, defining
	a conditional log-linear model over non-projective trees. By exploiting
	Kirchhoff’s Matrix-Tree Theorem (Tutte, 1984), the partition functions and
	marginals can be computed efficiently, leading to a straightforward end-to-end
	model training procedure via back-propagation. We evaluate our model on 17
	different datasets, across 14 different languages. Our parser achieves
	state-of-the-art parsing performance on nine datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ma-hovy:2017:I17-1</bibkey>
  </paper>

  <paper id="1008">
    <title>Word Ordering as Unsupervised Learning Towards Syntactically Plausible Word Representations</title>
    <author><first>Noriki</first><last>Nishida</last></author>
    <author><first>Hideki</first><last>Nakayama</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>70&#8211;79</pages>
    <url>http://www.aclweb.org/anthology/I17-1008</url>
    <abstract>The research question we explore in this study is how to obtain syntactically
	plausible word representations without using human annotations.
	Our underlying hypothesis is that word ordering tests, or linearizations, is
	suitable for learning syntactic knowledge about words.
	To verify this hypothesis, we develop a differentiable model called Word
	Ordering Network (WON) that explicitly learns to recover correct word order
	while implicitly acquiring word embeddings representing syntactic knowledge.
	We evaluate the word embeddings produced by the proposed method on downstream
	syntax-related tasks such as part-of-speech tagging and dependency parsing.
	The experimental results demonstrate that the WON consistently outperforms both
	order-insensitive and order-sensitive baselines on these tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nishida-nakayama:2017:I17-1</bibkey>
  </paper>

  <paper id="1009">
    <title>MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting</title>
    <author><first>Tomoyuki</first><last>Kajiwara</last></author>
    <author><first>Mamoru</first><last>Komachi</last></author>
    <author><first>Daichi</first><last>Mochihashi</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>80&#8211;89</pages>
    <url>http://www.aclweb.org/anthology/I17-1009</url>
    <abstract>We present a pointwise mutual information (PMI)-based approach to formalize
	paraphrasability and propose a variant of PMI, called MIPA, for the paraphrase
	acquisition.
	Our paraphrase acquisition method first acquires lexical paraphrase pairs by
	bilingual pivoting and then reranks them by PMI and distributional similarity.
	The complementary nature of information from bilingual corpora and from
	monolingual corpora makes the proposed method robust.
	Experimental results show that the proposed method substantially outperforms
	bilingual pivoting and distributional similarity themselves in terms of metrics
	such as MRR, MAP, coverage, and Spearman's correlation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kajiwara-komachi-mochihashi:2017:I17-1</bibkey>
  </paper>

  <paper id="1010">
    <title>Improving Implicit Semantic Role Labeling by Predicting Semantic Frame Arguments</title>
    <author><first>Quynh Ngoc Thi</first><last>Do</last></author>
    <author><first>Steven</first><last>Bethard</last></author>
    <author><first>Marie-Francine</first><last>Moens</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>90&#8211;99</pages>
    <url>http://www.aclweb.org/anthology/I17-1010</url>
    <abstract>Implicit semantic role labeling (iSRL) is the task of predicting the semantic
	roles of a predicate that do not appear as explicit arguments, but rather
	regard common sense knowledge or are mentioned earlier in the discourse. We
	introduce an approach to iSRL based on a predictive recurrent neural semantic
	frame model (PRNSFM) that uses a large unannotated corpus to learn the
	probability of a sequence of semantic arguments given a predicate. We leverage
	the sequence probabilities predicted by the PRNSFM to estimate selectional
	preferences for predicates and their arguments. On the NomBank iSRL test set,
	our approach improves state-of-the-art performance on implicit semantic role
	labeling with less reliance than prior work on manually constructed
	language resources.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>do-bethard-moens:2017:I17-1</bibkey>
  </paper>

  <paper id="1011">
    <title>Natural Language Inference from Multiple Premises</title>
    <author><first>Alice</first><last>Lai</last></author>
    <author><first>Yonatan</first><last>Bisk</last></author>
    <author><first>Julia</first><last>Hockenmaier</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>100&#8211;109</pages>
    <url>http://www.aclweb.org/anthology/I17-1011</url>
    <abstract>We define a novel textual entailment task that requires inference over multiple
	premise sentences. We present a new dataset for this task that minimizes
	trivial lexical inferences, emphasizes knowledge of everyday events, and
	presents a more challenging setting for textual entailment. We evaluate several
	strong neural baselines and analyze how the multiple premise task differs from
	standard textual entailment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lai-bisk-hockenmaier:2017:I17-1</bibkey>
  </paper>

  <paper id="1012">
    <title>Enabling Transitivity for Lexical Inference on Chinese Verbs Using Probabilistic Soft Logic</title>
    <author><first>Wei-Chung</first><last>Wang</last></author>
    <author><first>Lun-Wei</first><last>Ku</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>110&#8211;119</pages>
    <url>http://www.aclweb.org/anthology/I17-1012</url>
    <abstract>To learn more knowledge, enabling transitivity is a vital step for lexical
	inference. However, most of the lexical inference models with good performance
	are for nouns or noun phrases, which cannot be directly applied to the
	inference on events or states. In this paper, we construct the largest Chinese
	verb lexical inference dataset containing 18,029 verb pairs, where for each
	pair one of four inference relations are annotated. We further build a
	probabilistic soft logic (PSL) model to infer verb lexicons using the logic
	language. With PSL, we easily enable transitivity in two layers, the observed
	layer and the feature layer, which are included in the knowledge base. We
	further discuss the effect of transitives within and between these layers.
	Results show the performance of the proposed PSL model can be improved at least
	3.5% (relative) when the transitivity is enabled. Furthermore, experiments show
	that enabling transitivity in the observed layer benefits the most.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-ku:2017:I17-1</bibkey>
  </paper>

  <paper id="1013">
    <title>An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing</title>
    <author><first>Marcin</first><last>Junczys-Dowmunt</last></author>
    <author><first>Roman</first><last>Grundkiewicz</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>120&#8211;129</pages>
    <url>http://www.aclweb.org/anthology/I17-1013</url>
    <abstract>In this work, we explore multiple neural architectures adapted for the task of
	automatic post-editing of machine translation output.  We focus on neural
	end-to-end models that combine both inputs mt (raw MT output) and rc
	(source language input) in a single neural architecture, modeling mt,
	src -> e directly. Apart from that, we investigate the influence
	of hard-attention models which seem to be well-suited for monolingual tasks, as
	well as combinations of both ideas.
	We report results on data sets provided during the WMT-2016 shared task on
	automatic post-editing and can demonstrate that dual-attention models that
	incorporate all available data in the APE scenario in a single model improve on
	the best shared task system and on all other published results after the shared
	task. Dual-attention models that are combined with hard attention  remain
	competitive despite applying fewer changes to the input.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>junczysdowmunt-grundkiewicz:2017:I17-1</bibkey>
  </paper>

  <paper id="1014">
    <title>Imagination Improves Multimodal Translation</title>
    <author><first>Desmond</first><last>Elliott</last></author>
    <author><first>&#192;kos</first><last>K&#225;d&#225;r</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>130&#8211;141</pages>
    <url>http://www.aclweb.org/anthology/I17-1014</url>
    <abstract>We decompose multimodal translation into two sub-tasks: learning to translate
	and learning visually grounded representations. In a multitask learning
	framework, translations are learned in an attention-based encoder-decoder, and
	grounded representations are learned through image representation prediction.
	Our approach improves translation performance compared to the state of the art
	on the Multi30K dataset. Furthermore, it is equally effective if we train the
	image prediction task on the external MS COCO dataset, and we find improvements
	if we train the translation model on the external News Commentary parallel
	text.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>elliott-kadar:2017:I17-1</bibkey>
  </paper>

  <paper id="1015">
    <title>Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder</title>
    <author><first>Fahim</first><last>Dalvi</last></author>
    <author><first>Nadir</first><last>Durrani</last></author>
    <author><first>Hassan</first><last>Sajjad</last></author>
    <author><first>Yonatan</first><last>Belinkov</last></author>
    <author><first>Stephan</first><last>Vogel</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>142&#8211;151</pages>
    <url>http://www.aclweb.org/anthology/I17-1015</url>
    <abstract>End-to-end training makes the neural machine translation (NMT) architecture
	simpler, yet elegant compared to traditional statistical machine translation
	(SMT). However, little is known about linguistic patterns of morphology, syntax
	and semantics learned during the training of NMT systems, and more importantly,
	which parts of the architecture are responsible for learning each of these
	phenomenon. In this paper we i) analyze how much morphology an NMT decoder
	learns, and ii) investigate whether injecting target morphology in the decoder
	helps it to produce better translations. To this end we present three methods:
	i) simultaneous translation, ii) joint-data learning, and iii) multi-task
	learning. Our results show that explicit morphological information helps the
	 decoder learn target language morphology and improves the translation 
	quality by 0.2&#8211;0.6 BLEU points.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dalvi-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1016">
    <title>Improving Neural Machine Translation through Phrase-based Forced Decoding</title>
    <author><first>Jingyi</first><last>Zhang</last></author>
    <author><first>Masao</first><last>Utiyama</last></author>
    <author><first>Eiichro</first><last>Sumita</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <author><first>Satoshi</first><last>Nakamura</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>152&#8211;162</pages>
    <url>http://www.aclweb.org/anthology/I17-1016</url>
    <abstract>Compared to traditional statistical machine translation (SMT), neural machine
	translation (NMT) often sacrifices adequacy for the sake of fluency. We propose
	a method to combine the advantages of traditional SMT and NMT by exploiting an
	existing phrase-based SMT model to compute the phrase-based decoding cost for
	an NMT output and then using the phrase-based decoding cost to rerank the
	n-best NMT outputs. The main challenge in implementing this approach is that
	NMT outputs may not be in the search space of the standard phrase-based
	decoding algorithm, because the search space of phrase-based SMT is limited by
	the phrase-based translation rule table. We propose a soft forced decoding
	algorithm, which can always successfully find a decoding path for any NMT
	output. We show that using the forced decoding cost to rerank the NMT outputs
	can successfully improve translation quality on four different language pairs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-EtAl:2017:I17-12</bibkey>
  </paper>

  <paper id="1017">
    <title>Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation</title>
    <author><first>Chunqi</first><last>Wang</last></author>
    <author><first>Bo</first><last>Xu</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>163&#8211;172</pages>
    <url>http://www.aclweb.org/anthology/I17-1017</url>
    <abstract>Character-based sequence labeling framework is flexible and efficient for
	Chinese word segmentation (CWS).
	Recently, many character-based neural models have been applied to CWS. While
	they obtain good performance, they have two obvious weaknesses. The first is
	that they heavily rely on manually designed bigram feature, i.e. they are not
	good at capturing n-gram features automatically. The second is that they
	make no use of full word information. For the first weakness, we propose a
	convolutional neural model, which is able to capture rich $n$-gram features
	without any feature engineering.
	For the second one, we propose an effective approach to integrate the proposed
	model with word embeddings.
	We evaluate the model on two benchmark datasets: PKU and MSR. Without any
	feature engineering, the model obtains competitive performance &#8211;- 95.7% on
	PKU and 97.3% on MSR. Armed with word embeddings, the model achieves
	state-of-the-art performance on both datasets &#8211;- 96.5% on PKU and 98.0% on
	MSR, without using any external labeled resource.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-xu:2017:I17-1</bibkey>
  </paper>

  <paper id="1018">
    <title>Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF</title>
    <author><first>Yan</first><last>Shao</last></author>
    <author><first>Christian</first><last>Hardmeier</last></author>
    <author><first>J&#246;rg</first><last>Tiedemann</last></author>
    <author><first>Joakim</first><last>Nivre</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>173&#8211;183</pages>
    <url>http://www.aclweb.org/anthology/I17-1018</url>
    <software>I17-1018.Software.zip</software>
    <abstract>We present a character-based model for joint segmentation and POS tagging for
	Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is
	adapted and applied with novel vector representations of Chinese characters
	that capture rich contextual information and lower-than-character level
	features. The proposed model is extensively evaluated and compared with a
	state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The
	experimental results indicate that our model is accurate and robust across
	datasets in different sizes, genres and annotation schemes. We obtain
	state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint
	segmentation and POS tagging.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shao-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1019">
    <title>Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure</title>
    <author><first>Shen</first><last>Huang</last></author>
    <author><first>Xu</first><last>Sun</last></author>
    <author><first>Houfeng</first><last>Wang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>184&#8211;193</pages>
    <url>http://www.aclweb.org/anthology/I17-1019</url>
    <abstract>Boundary features are widely used in traditional Chinese Word Segmentation
	(CWS) methods as they can utilize unlabeled data to help improve the
	Out-of-Vocabulary (OOV) word recognition performance. Although various neural
	network methods for CWS have achieved performance competitive with
	state-of-the-art systems, these methods, constrained by the domain and size of
	the training corpus, do not work well in domain adaptation. In this paper, we
	propose a novel BLSTM-based neural network model which incorporates a global
	recurrent structure designed for modeling boundary features dynamically.
	Experiments show that the proposed structure can effectively boost the
	performance of Chinese Word Segmentation, especially OOV-Recall, which brings
	benefits to domain adaptation. We achieved state-of-the-art results on 6
	domains of CNKI articles, and competitive results to the best reported on the 4
	domains of SIGHAN Bakeoff 2010 data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>huang-sun-wang:2017:I17-1</bibkey>
  </paper>

  <paper id="1020">
    <title>Information Bottleneck Inspired Method For Chat Text Segmentation</title>
    <author><first>S</first><last>Vishal</last></author>
    <author><first>Mohit</first><last>Yadav</last></author>
    <author><first>Lovekesh</first><last>Vig</last></author>
    <author><first>Gautam</first><last>Shroff</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>194&#8211;203</pages>
    <url>http://www.aclweb.org/anthology/I17-1020</url>
    <dataset>I17-1020.Datasets.zip</dataset>
    <abstract>We present a novel technique for segmenting chat conversations using the
	information bottleneck method (Tishby et al., 2000), augmented with sequential
	continuity constraints. Furthermore, we utilize critical non-textual clues such
	as time between two consecutive posts and people mentions within the posts. To
	ascertain the effectiveness of the proposed method, we have collected data from
	public Slack conversations and Fresco, a proprietary platform deployed inside
	our organization. Experiments demonstrate that the proposed method yields an
	absolute (relative) improvement of as high as 3.23% (11.25%). To facilitate
	future research, we are releasing manual annotations for segmentation on public
	Slack conversations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vishal-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1021">
    <title>Distributional Modeling on a Diet: One-shot Word Learning from Text Only</title>
    <author><first>Su</first><last>Wang</last></author>
    <author><first>Stephen</first><last>Roller</last></author>
    <author><first>Katrin</first><last>Erk</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>204&#8211;213</pages>
    <url>http://www.aclweb.org/anthology/I17-1021</url>
    <abstract>We test whether distributional models can do one-shot learning of definitional
	properties from text only. Using Bayesian models, we find that first learning
	overarching structure in the known data, regularities in textual contexts and
	in properties, helps one-shot learning, and that individual context items can
	be highly informative.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-roller-erk:2017:I17-1</bibkey>
  </paper>

  <paper id="1022">
    <title>A Computational Study on Word Meanings and Their Distributed Representations via Polymodal Embedding</title>
    <author><first>Joohee</first><last>Park</last></author>
    <author><first>Sung-Hyon</first><last>Myaeng</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>214&#8211;223</pages>
    <url>http://www.aclweb.org/anthology/I17-1022</url>
    <abstract>A distributed representation has become a popular approach to capturing a word
	meaning. Besides its success and practical value, however, questions arise
	about the
	relationships between a true word meaning and its distributed representation. 
	In this paper, we examine such a relationship via polymodal embedding approach
	inspired by the theory that humans tend to use diverse sources in developing a
	word meaning. The result suggests that the existing embeddings lack in
	capturing certain aspects of word meanings which can be significantly improved
	by the polymodal approach. Also, we show distinct characteristics of different
	types of words (e.g. concreteness) via computational studies. Finally, we show
	our proposed embedding method outperforms the baselines in the word similarity
	measure tasks and the hypernym prediction tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>park-myaeng:2017:I17-1</bibkey>
  </paper>

  <paper id="1023">
    <title>Geographical Evaluation of Word Embeddings</title>
    <author><first>Michal</first><last>Konkol</last></author>
    <author><first>Tom&#225;&#x161;</first><last>Brychc&#237;n</last></author>
    <author><first>Michal</first><last>Nykl</last></author>
    <author><first>Tom&#225;&#x161;</first><last>Hercig</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>224&#8211;232</pages>
    <url>http://www.aclweb.org/anthology/I17-1023</url>
    <abstract>Word embeddings are commonly compared either with human-annotated word
	similarities or through improvements in natural language processing tasks. We
	propose a novel principle which compares the information from word embeddings
	with reality. We implement this principle by comparing the information in the
	word embeddings with geographical positions of cities. Our evaluation linearly
	transforms the semantic space to optimally fit the real positions of cities and
	measures the deviation between the position given by word embeddings and the
	real position. A set of well-known word embeddings with state-of-the-art
	results were evaluated. We also introduce a visualization that helps with error
	analysis.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>konkol-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1024">
    <title>On Modeling Sense Relatedness in Multi-prototype Word Embedding</title>
    <author><first>Yixin</first><last>Cao</last></author>
    <author><first>Jiaxin</first><last>Shi</last></author>
    <author><first>Juanzi</first><last>Li</last></author>
    <author><first>Zhiyuan</first><last>Liu</last></author>
    <author><first>Chengjiang</first><last>Li</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>233&#8211;242</pages>
    <url>http://www.aclweb.org/anthology/I17-1024</url>
    <abstract>To enhance the expression ability of distributional word representation
	learning model, many researchers tend to induce word senses through clustering,
	and learn multiple embedding vectors for each word, namely multi-prototype word
	embedding model. However, most related work ignores the relatedness among word
	senses which actually plays an important role. In this paper, we propose a
	novel approach to capture word sense relatedness in multi-prototype word
	embedding model. Particularly, we differentiate the original sense and extended
	senses of a word by introducing their global occurrence information and model
	their relatedness through the local textual context information. Based on the
	idea of fuzzy clustering, we introduce a random process to integrate these two
	types of senses and design two non-parametric methods for word sense induction.
	To make our model more scalable and efficient, we use an online joint learning
	framework extended from the Skip-gram model. The experimental results
	demonstrate that our model outperforms both conventional single-prototype
	embedding models and other multi-prototype embedding models, and achieves more
	stable performance when trained on smaller data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>cao-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1025">
    <title>Unsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context</title>
    <author><first>Ryu</first><last>Takeda</last></author>
    <author><first>Kazunori</first><last>Komatani</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>243&#8211;252</pages>
    <url>http://www.aclweb.org/anthology/I17-1025</url>
    <abstract>Unsupervised segmentation of phoneme sequences is an essential process to
	obtain unknown words during spoken dialogues. 
	In this segmentation, an input phoneme sequence without delimiters is converted
	into segmented sub-sequences corresponding to words.
	The Pitman-Yor semi-Markov model (PYSMM) is promising for this problem, but its
	performance degrades when it is applied to phoneme-level word segmentation. 
	This is because of insufficient cues for the segmentation, e.g., homophones are
	improperly treated as single entries and their different contexts are also
	confused. 
	We propose a phoneme-length context model for PYSMM to give a helpful cue at
	the phoneme-level and to predict succeeding segments more accurately. 
	Our experiments showed that the peak performance with our context model
	outperformed those without such a context model by 0.045 at most in terms of
	F-measures of estimated segmentation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>takeda-komatani:2017:I17-1</bibkey>
  </paper>

  <paper id="1026">
    <title>A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification</title>
    <author><first>Ye</first><last>Zhang</last></author>
    <author><first>Byron</first><last>Wallace</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>253&#8211;263</pages>
    <url>http://www.aclweb.org/anthology/I17-1026</url>
    <abstract>Convolutional Neural Networks (CNNs) have recently achieved remarkably strong
	performance on the practically important task of sentence classification (Kim,
	2014; Kalchbrenner et al., 2014; Johnson and Zhang, 2014; Zhang et al., 2016).
	However, these models require practitioners to specify an exact model
	architecture and set accompanying hyperparameters, including the filter region
	size, regularization parameters, and so on. It is currently unknown
	how sensitive model performance is to changes in these configurations for the
	task of sentence classification. We thus conduct a sensitivity analysis of
	one-layer CNNs to explore the effect of architecture components on model
	performance; our aim is to distinguish between important
	and comparatively inconsequential design decisions for sentence classification.
	We focus on one-layer CNNs (to the exclusion of more complex models) due to
	their comparative simplicity and strong empirical performance, which makes it a
	modern standard baseline method akin to Support Vector Machine (SVMs) and
	logistic regression. We derive practical advice from our extensive empirical
	results for those interested in getting the most out of CNNs for sentence
	classification in real world settings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-wallace:2017:I17-1</bibkey>
  </paper>

  <paper id="1027">
    <title>Coordination Boundary Identification with Similarity and Replaceability</title>
    <author><first>Hiroki</first><last>Teranishi</last></author>
    <author><first>Hiroyuki</first><last>Shindo</last></author>
    <author><first>Yuji</first><last>Matsumoto</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>264&#8211;272</pages>
    <url>http://www.aclweb.org/anthology/I17-1027</url>
    <abstract>We propose a neural network model for coordination boundary detection. Our
	method relies on the two common properties - similarity and replaceability in
	conjuncts - in order to detect both similar pairs of conjuncts and dissimilar
	pairs of conjuncts. The model improves identification of clause-level
	coordination using bidirectional RNNs incorporating two properties as features.
	We show that our model outperforms the existing state-of-the-art methods on the
	coordination annotated Penn Treebank and Genia corpus without any syntactic
	information from parsers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>teranishi-shindo-matsumoto:2017:I17-1</bibkey>
  </paper>

  <paper id="1028">
    <title>Turning Distributional Thesauri into Word Vectors for Synonym Extraction and Expansion</title>
    <author><first>Olivier</first><last>Ferret</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>273&#8211;283</pages>
    <url>http://www.aclweb.org/anthology/I17-1028</url>
    <abstract>In this article, we propose to investigate a new problem consisting in turning
	a distributional thesaurus into dense word vectors. We propose more precisely a
	method for performing such task by associating graph embedding and  distributed
	representation adaptation. We have applied and evaluated it for English nouns
	at a large scale about its ability to retrieve synonyms. In this context, we
	have also illustrated the interest of the developed method for three different
	tasks: the improvement of already existing word embeddings, the fusion of
	heterogeneous representations and the expansion of synsets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ferret:2017:I17-1</bibkey>
  </paper>

  <paper id="1029">
    <title>Training Word Sense Embeddings With Lexicon-based Regularization</title>
    <author><first>Luis</first><last>Nieto Pi&#241;a</last></author>
    <author><first>Richard</first><last>Johansson</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>284&#8211;294</pages>
    <url>http://www.aclweb.org/anthology/I17-1029</url>
    <abstract>We propose to improve word sense embeddings by enriching an automatic
	corpus-based method with lexicographic data. Information from a lexicon is
	introduced into the learning algorithm’s objective function through a
	regularizer. The incorporation of lexicographic data yields embeddings that are
	able to reflect expert-defined word senses, while retaining the robustness,
	high quality, and coverage of automatic corpus-based methods. These properties
	are observed in a manual inspection of the semantic clusters that different
	degrees of regularizer strength create in the vector space. Moreover, we
	evaluate the sense embeddings in two downstream applications: word sense
	disambiguation and semantic frame prediction, where they outperform simpler
	approaches. Our results show that a corpus-based model balanced with
	lexicographic data learns better representations and improve their performance
	in downstream tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nietopina-johansson:2017:I17-1</bibkey>
  </paper>

  <paper id="1030">
    <title>Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs</title>
    <author><first>Fernando</first><last>Alva-Manchego</last></author>
    <author><first>Joachim</first><last>Bingel</last></author>
    <author><first>Gustavo</first><last>Paetzold</last></author>
    <author><first>Carolina</first><last>Scarton</last></author>
    <author><first>Lucia</first><last>Specia</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>295&#8211;305</pages>
    <url>http://www.aclweb.org/anthology/I17-1030</url>
    <attachment type="note">I17-1030.Notes.pdf</attachment>
    <abstract>Current research in text simplification has been hampered by two central
	problems: (i) the small amount of high-quality parallel simplification data
	available, and (ii) the lack of explicit annotations of simplification
	operations, such as deletions or substitutions, on existing data. While the
	recently introduced Newsela corpus has alleviated the first problem,
	simplifications still need to be learned directly from parallel text using
	black-box, end-to-end approaches rather than from explicit annotations. These
	complex-simple parallel sentence pairs often differ to such a high degree that
	generalization becomes difficult.  End-to-end models also make it hard to
	interpret what is actually learned from data. We propose a method that
	decomposes the task of TS into its sub-problems. We devise a way to
	automatically identify operations in a parallel corpus and introduce a
	sequence-labeling approach based on these annotations. Finally, we provide
	insights on the types of transformations that different approaches can model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>alvamanchego-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1031">
    <title>Domain-Adaptable Hybrid Generation of RDF Entity Descriptions</title>
    <author><first>Or</first><last>Biran</last></author>
    <author><first>Kathleen</first><last>McKeown</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>306&#8211;315</pages>
    <url>http://www.aclweb.org/anthology/I17-1031</url>
    <abstract>RDF ontologies provide structured data on entities in many domains and continue
	to grow in size and diversity. While they can be useful as a starting point for
	generating descriptions of entities, they often miss important information
	about an entity that cannot be captured as simple relations. In addition,
	generic approaches to generation from RDF cannot capture the unique style and
	content of specific domains. We describe a framework for hybrid generation of
	entity descriptions, which combines generation from RDF data with text
	extracted from a corpus, and extracts unique aspects of the domain from the
	corpus to create domain-specific generation systems. We show that each
	component of our approach significantly increases the satisfaction of readers
	with the text across multiple applications and domains.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>biran-mckeown:2017:I17-1</bibkey>
  </paper>

  <paper id="1032">
    <title>ES-LDA: Entity Summarization using Knowledge-based Topic Modeling</title>
    <author><first>Seyedamin</first><last>Pouriyeh</last></author>
    <author><first>Mehdi</first><last>Allahyari</last></author>
    <author><first>Krzysztof</first><last>Kochut</last></author>
    <author><first>Gong</first><last>Cheng</last></author>
    <author><first>Hamid Reza</first><last>Arabnia</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>316&#8211;325</pages>
    <url>http://www.aclweb.org/anthology/I17-1032</url>
    <abstract>With the advent of the Internet, the amount of Semantic Web documents that
	describe real-world entities and their inter-links as a set of statements have
	grown considerably. These descriptions are usually lengthy, which makes the
	utilization of the underlying entities a difficult task. Entity summarization,
	which aims to create summaries for real-world entities, has gained increasing
	attention in recent years. In this paper, we propose a probabilistic topic
	model, ES-LDA, that combines prior knowledge with statistical learning
	techniques within a single framework to create more reliable and representative
	summaries for entities. We demonstrate the effectiveness of our approach by
	conducting extensive experiments and show that our model outperforms the
	state-of-the-art techniques and enhances the quality of the entity summaries.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pouriyeh-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1033">
    <title>Procedural Text Generation from an Execution Video</title>
    <author><first>Atsushi</first><last>Ushiku</last></author>
    <author><first>Hayato</first><last>Hashimoto</last></author>
    <author><first>Atsushi</first><last>Hashimoto</last></author>
    <author><first>Shinsuke</first><last>Mori</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>326&#8211;335</pages>
    <url>http://www.aclweb.org/anthology/I17-1033</url>
    <abstract>In recent years, there has been a surge of interest in automatically describing
	  images or videos in a natural language.  These descriptions are useful for
	  image/video search, etc.  In this paper, we focus on procedure execution
	videos,
	  in which a human makes or repairs something and propose a method for
	generating
	  procedural texts from them.  Since video/text pairs available are limited in
	size,
	  the direct application of end-to-end deep learning is not feasible.  Thus we
	propose to
	  train Faster R-CNN network for object recognition and LSTM for text
	generation
	  and combine them at run time.  We took pairs of recipe and cooking video,
	  generated a recipe from a video, and compared it with the original recipe.
	  The experimental results showed that our method can produce a recipe as
	accurate
	  as the state-of-the-art scene descriptions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ushiku-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1034">
    <title>Text Sentiment Analysis based on Fusion of Structural Information and Serialization Information</title>
    <author><first>Ling</first><last>Gan</last></author>
    <author><first>Houyu</first><last>Gong</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>336&#8211;341</pages>
    <url>http://www.aclweb.org/anthology/I17-1034</url>
    <abstract>Tree-structured Long Short-Term Memory (Tree-LSTM) has been proved to be an
	effective method in the sentiment analysis task. It extracts structural
	information on text, and uses Long Short-Term Memory (LSTM) cell to prevent
	gradient vanish. However, though combining the LSTM cell, it is still a kind of
	model that extracts the structural information and almost not extracts
	serialization information. In this paper, we propose three new models in order
	to combine those two kinds of information: the structural information generated
	by the Constituency Tree-LSTM and the serialization information generated by
	Long-Short Term Memory neural network. Our experiments show that combining
	those two kinds of information can give contributes to the performance of the
	sentiment analysis task compared with the single Constituency Tree-LSTM model
	and the LSTM model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gan-gong:2017:I17-1</bibkey>
  </paper>

  <paper id="1035">
    <title>Length, Interchangeability, and External Knowledge: Observations from Predicting Argument Convincingness</title>
    <author><first>Peter</first><last>Potash</last></author>
    <author><first>Robin</first><last>Bhattacharya</last></author>
    <author><first>Anna</first><last>Rumshisky</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>342&#8211;351</pages>
    <url>http://www.aclweb.org/anthology/I17-1035</url>
    <abstract>In this work, we provide insight into three key aspects related to predicting
	argument convincingness. First, we explicitly display the power that text
	length possesses for predicting convincingness in an unsupervised setting.
	Second, we show that a bag-of-words embedding model posts state-of-the-art
	on a dataset of arguments annotated for convincingness, outperforming an
	SVM with numerous hand-crafted features as well as recurrent neural network
	models that attempt to capture semantic composition. Finally, we assess
	the feasibility of integrating external knowledge when predicting
	convincingness, as arguments are often more convincing when they contain
	abundant information and facts. We finish by analyzing the correlations
	between the various models we propose.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>potash-bhattacharya-rumshisky:2017:I17-1</bibkey>
  </paper>

  <paper id="1036">
    <title>Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks</title>
    <author><first>Shaoyang</first><last>Duan</last></author>
    <author><first>Ruifang</first><last>He</last></author>
    <author><first>Wenli</first><last>Zhao</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>352&#8211;361</pages>
    <url>http://www.aclweb.org/anthology/I17-1036</url>
    <abstract>This paper tackles the task of event detection, which involves identifying and
	categorizing events. The previous work mainly exist two problems: (1) the
	traditional feature-based methods apply cross-sentence information, yet need
	taking a large amount of human effort to design complicated feature sets and
	inference rules; (2) the representation-based methods though overcome the
	problem of manually extracting features, while just depend on local sentence
	representation. Considering local sentence context is insufficient to resolve
	ambiguities in identifying particular event types, therefore, we propose a
	novel document level Recurrent Neural Networks (DLRNN) model, which can
	automatically extract cross-sentence clues to improve sentence level event
	detection without designing complex reasoning rules. Experiment results show
	that our approach outperforms other state-of-the-art methods on ACE 2005
	dataset without external knowledge base.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>duan-he-zhao:2017:I17-1</bibkey>
  </paper>

  <paper id="1037">
    <title>Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging</title>
    <author><first>Boliang</first><last>Zhang</last></author>
    <author><first>Di</first><last>Lu</last></author>
    <author><first>Xiaoman</first><last>Pan</last></author>
    <author><first>Ying</first><last>Lin</last></author>
    <author><first>Halidanmu</first><last>Abudukelimu</last></author>
    <author><first>Heng</first><last>Ji</last></author>
    <author><first>Kevin</first><last>Knight</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>362&#8211;372</pages>
    <url>http://www.aclweb.org/anthology/I17-1037</url>
    <abstract>Current supervised name tagging approaches are inadequate for most low-resource
	languages due to the lack of annotated data and actionable linguistic
	knowledge. 
	All supervised learning methods (including deep neural networks (DNN)) are
	sensitive to noise and thus they are not quite                          portable
	without
	massive
	clean
	annotations. We found that the F-scores of DNN-based name taggers drop rapidly
	(20%-30%) when we replace clean manual annotations with noisy annotations in
	the training data. We propose a new solution to incorporate many
	non-traditional language universal resources that are readily available but
	rarely explored in the Natural Language Processing (NLP) community, such as the
	World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. 
	We acquire and encode various types of non-traditional 
	linguistic resources into a DNN name tagger. Experiments on three low-resource
	languages show that feeding linguistic knowledge 
	can make DNN significantly more robust to noise, achieving 8%-22% absolute
	F-score gains on name tagging without using any human annotation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-EtAl:2017:I17-13</bibkey>
  </paper>

  <paper id="1038">
    <title>NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project</title>
    <author><first>Inguna</first><last>Skadina</last></author>
    <author><first>M&#257;rcis</first><last>Pinnis</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>373&#8211;383</pages>
    <url>http://www.aclweb.org/anthology/I17-1038</url>
    <abstract>The recent technological shift in machine translation from statistical machine
	translation (SMT) to neural machine translation (NMT) raises the question of
	the strengths and weaknesses of NMT. In this paper, we present an analysis of
	NMT and SMT systems' outputs from narrow domain English-Latvian MT systems that
	were trained on a rather small amount of data. We analyze post-edits produced
	by professional translators and manually annotated errors in these outputs.
	Analysis of post-edits allowed us to conclude that both approaches are
	comparably successful, allowing for an increase in translators' productivity,
	with the NMT system showing slightly worse results. Through the analysis of
	annotated errors, we found that NMT translations are more fluent than SMT
	translations. However, errors related to accuracy, especially, mistranslation
	and omission errors, occur more often in NMT outputs. The word form errors,
	that characterize the morphological richness of Latvian, are frequent for both
	systems, but slightly fewer in NMT outputs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>skadina-pinnis:2017:I17-1</bibkey>
  </paper>

  <paper id="1039">
    <title>Towards Neural Machine Translation with Partially Aligned Corpora</title>
    <author><first>Yining</first><last>Wang</last></author>
    <author><first>Yang</first><last>Zhao</last></author>
    <author><first>Jiajun</first><last>Zhang</last></author>
    <author><first>Chengqing</first><last>Zong</last></author>
    <author><first>Zhengshan</first><last>Xue</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>384&#8211;393</pages>
    <url>http://www.aclweb.org/anthology/I17-1039</url>
    <abstract>While neural machine translation (NMT) has become the new paradigm, the
	parameter optimization requires large-scale parallel data which is scarce in
	many domains and language pairs. In this paper, we address a new translation
	scenario in which there only exists monolingual corpora and phrase pairs. We
	propose a new method towards translation with partially aligned sentence pairs
	which are derived from the phrase pairs and monolingual corpora. To make full
	use of the partially aligned corpora, we adapt the conventional NMT training
	method in two aspects. On one hand, different generation strategies are
	designed for aligned and unaligned target words. On the other hand, a different
	objective function is designed to model the partially aligned parts. The
	experiments demonstrate that our method can achieve a relatively good result in
	such a translation scenario, and tiny bitexts can boost translation quality to
	a large extent.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-EtAl:2017:I17-11</bibkey>
  </paper>

  <paper id="1040">
    <title>Identifying Usage Expression Sentences in Consumer Product Reviews</title>
    <author><first>Shibamouli</first><last>Lahiri</last></author>
    <author><first>V.G.Vinod</first><last>Vydiswaran</last></author>
    <author><first>Rada</first><last>Mihalcea</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>394&#8211;403</pages>
    <url>http://www.aclweb.org/anthology/I17-1040</url>
    <abstract>In this paper we introduce the problem of identifying usage expression
	sentences in a consumer product review.  We create a human-annotated gold
	standard dataset of 565 reviews spanning five distinct product categories. Our
	dataset consists of more than 3,000 annotated sentences. We further introduce a
	classification system to label sentences according to whether or not they
	describe some "usage". The system combines lexical, syntactic, and semantic
	features in a product-agnostic fashion to yield good classification
	performance. We show the effectiveness of our approach using importance ranking
	of features, error analysis, and cross-product classification experiments.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lahiri-vydiswaran-mihalcea:2017:I17-1</bibkey>
  </paper>

  <paper id="1041">
    <title>Between Reading Time and Syntactic/Semantic Categories</title>
    <author><first>Masayuki</first><last>Asahara</last></author>
    <author><first>Sachi</first><last>Kato</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>404&#8211;412</pages>
    <url>http://www.aclweb.org/anthology/I17-1041</url>
    <abstract>This article presents a contrastive analysis between reading time and
	syntactic/semantic categories in Japanese. We overlaid the reading time
	annotation of BCCWJ-EyeTrack and a syntactic/semantic category information
	annotation on the `Balanced Corpus of Contemporary Written Japanese'.
	Statistical analysis based on a mixed linear model showed that verbal phrases
	tend to have shorter reading times than adjectives, adverbial phrases, or
	nominal phrases. The results suggest that the preceding phrases associated with
	the presenting phrases promote the reading process to shorten the gazing time.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>asahara-kato:2017:I17-1</bibkey>
  </paper>

  <paper id="1042">
    <title>WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition</title>
    <author><first>Abbas</first><last>Ghaddar</last></author>
    <author><first>Phillippe</first><last>Langlais</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>413&#8211;422</pages>
    <url>http://www.aclweb.org/anthology/I17-1042</url>
    <abstract>We revisit the idea of mining Wikipedia in order to generate named-entity 
	annotations. We propose a new methodology that we applied to English Wikipedia
	to build WiNER, a large, high quality, annotated corpus. We evaluate its
	usefulness on 6 NER tasks, comparing 4 popular state-of-the art approaches. We
	show that LSTM-CRF is the approach that benefits the most from our corpus. We
	report impressive gains with this model when using a small portion of WiNER on
	top of the CONLL training material. Last, we propose a simple but efficient
	method for exploiting the full range of WiNER, leading to further improvements.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ghaddar-langlais:2017:I17-1</bibkey>
  </paper>

  <paper id="1043">
    <title>Reusing Neural Speech Representations for Auditory Emotion Recognition</title>
    <author><first>Egor</first><last>Lakomkin</last></author>
    <author><first>Cornelius</first><last>Weber</last></author>
    <author><first>Sven</first><last>Magg</last></author>
    <author><first>Stefan</first><last>Wermter</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>423&#8211;430</pages>
    <url>http://www.aclweb.org/anthology/I17-1043</url>
    <abstract>Acoustic emotion recognition aims to categorize the affective state of the
	speaker and is still a difficult task for machine learning models. The
	difficulties come from the scarcity of training data, general subjectivity in
	emotion perception resulting in low annotator agreement, and the uncertainty
	about which features are the most relevant and robust ones for classification.
	In this paper, we will tackle the latter problem. Inspired by the recent
	success of transfer learning methods we propose a set of architectures which
	utilize neural representations inferred by training on large speech databases
	for the acoustic emotion recognition task. Our experiments on the IEMOCAP
	dataset show ~10% relative improvements in the accuracy and F1-score over the
	baseline recurrent neural network which is trained end-to-end for emotion
	recognition.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lakomkin-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1044">
    <title>Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing</title>
    <author><first>Andros</first><last>Tjandra</last></author>
    <author><first>Sakriani</first><last>Sakti</last></author>
    <author><first>Satoshi</first><last>Nakamura</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>431&#8211;440</pages>
    <url>http://www.aclweb.org/anthology/I17-1044</url>
    <abstract>Recently, encoder-decoder neural networks have shown impressive performance on
	many sequence-related tasks. The architecture commonly uses an attentional
	mechanism which allows the model to learn alignments between the source and the
	target sequence. Most attentional mechanisms used today is based on a global
	attention property which requires a computation of a weighted summarization of
	the whole input sequence generated by encoder states. However, it is
	computationally expensive and often produces misalignment on the longer input
	sequence. Furthermore, it does not fit with monotonous or left-to-right nature
	in several tasks, such as automatic speech recognition (ASR),
	grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention
	mechanism that has local and monotonic properties. Various  ways  to  control 
	those  properties  are                                                        also   
	explored.
	Experimental results
	on
	ASR, G2P
	and
	machine translation between two languages with similar sentence structures,
	demonstrate that the proposed encoder-decoder model with local monotonic
	attention could achieve significant performance improvements and reduce the
	computational complexity in comparison with the one that used the standard
	global attention architecture.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tjandra-sakti-nakamura:2017:I17-1</bibkey>
  </paper>

  <paper id="1045">
    <title>Attentive Language Models</title>
    <author><first>Giancarlo</first><last>Salton</last></author>
    <author><first>Robert</first><last>Ross</last></author>
    <author><first>John</first><last>Kelleher</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>441&#8211;450</pages>
    <url>http://www.aclweb.org/anthology/I17-1045</url>
    <abstract>In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs)
	with an attention mechanism. We show that an "attentive" RNN-LM (with 11M
	parameters) achieves a better perplexity than larger RNN-LMs (with 66M
	parameters) and achieves performance comparable to an ensemble of 10 similar
	sized RNN-LMs. We also show that an "attentive" RNN-LM needs less contextual
	information to achieve similar results to the state-of-the-art on the wikitext2
	dataset.
	Author{2}Affiliation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>salton-ross-kelleher:2017:I17-1</bibkey>
  </paper>

  <paper id="1046">
    <title>Diachrony-aware Induction of Binary Latent Representations from Typological Features</title>
    <author><first>Yugo</first><last>Murawaki</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>451&#8211;461</pages>
    <url>http://www.aclweb.org/anthology/I17-1046</url>
    <attachment type="note">I17-1046.Notes.pdf</attachment>
    <abstract>Although features of linguistic typology are a promising alternative to lexical
	evidence for tracing evolutionary history of languages, a large number of
	missing values in the dataset pose serious difficulties for statistical
	modeling.
	In this paper, we combine two existing approaches to the problem: (1) the
	synchronic approach that focuses on interdependencies between features and (2)
	the diachronic approach that exploits phylogenetically- and/or
	spatially-related languages.
	Specifically, we propose a Bayesian model that (1) represents each language as
	a sequence of binary latent parameters encoding inter-feature dependencies and
	(2) relates a language's parameters to those of its phylogenetic and spatial
	neighbors.
	Experiments show that the proposed model recovers missing values more
	accurately than others and that induced representations retain phylogenetic and
	spatial signals observed for surface features.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>murawaki:2017:I17-1</bibkey>
  </paper>

  <paper id="1047">
    <title>Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation</title>
    <author><first>Nasrin</first><last>Mostafazadeh</last></author>
    <author><first>Chris</first><last>Brockett</last></author>
    <author><first>Bill</first><last>Dolan</last></author>
    <author><first>Michel</first><last>Galley</last></author>
    <author><first>Jianfeng</first><last>Gao</last></author>
    <author><first>Georgios</first><last>Spithourakis</last></author>
    <author><first>Lucy</first><last>Vanderwende</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>462&#8211;472</pages>
    <url>http://www.aclweb.org/anthology/I17-1047</url>
    <attachment type="note">I17-1047.Notes.pdf</attachment>
    <abstract>The popularity of image sharing on social media and the engagement it creates
	between users reﬂect the important role that visual context plays in
	everyday conversations. We present a novel task, Image Grounded Conversations
	(IGC), in which natural-sounding conversations are generated about a shared
	image. To benchmark progress, we introduce a new multiple reference dataset of
	crowd-sourced, event-centric conversations on images. IGC falls on the
	continuum between chit-chat and goal-directed conversation models, where visual
	grounding constrains the topic of conversation to event-driven utterances.
	Experiments with models trained on social media data show that the combination
	of visual and textual context enhances the quality of generated conversational
	turns. In human evaluation, the gap between human performance and that of both
	neural and retrieval architectures suggests that multi-modal IGC presents an
	interesting challenge for dialog research.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mostafazadeh-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1048">
    <title>A Neural Language Model for Dynamically Representing the Meanings of Unknown Words and Entities in a Discourse</title>
    <author><first>Sosuke</first><last>Kobayashi</last></author>
    <author><first>Naoaki</first><last>Okazaki</last></author>
    <author><first>Kentaro</first><last>Inui</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>473&#8211;483</pages>
    <url>http://www.aclweb.org/anthology/I17-1048</url>
    <abstract>This study addresses the problem of identifying the meaning of unknown words or
	entities in a discourse with respect to the word embedding approaches used in
	neural language models. We proposed a method for on-the-fly construction and
	exploitation of word embeddings in both the input and output layers of a neural
	model by tracking contexts. This extends the dynamic entity representation used
	in Kobayashi et al. (2016) and incorporates a copy mechanism proposed
	independently by Gu et al. (2016) and Gulcehre et al. (2016). In addition, we
	construct a new task and dataset called Anonymized Language Modeling for
	evaluating the ability to capture word meanings while reading. Experiments
	conducted using our novel dataset show that the proposed variant of RNN
	language model outperformed the baseline model. Furthermore, the experiments
	also demonstrate that dynamic updates of an output layer help a model predict
	reappearing entities, whereas those of an input layer are effective to predict
	words following reappearing entities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kobayashi-okazaki-inui:2017:I17-1</bibkey>
  </paper>

  <paper id="1049">
    <title>Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification</title>
    <author><first>Wei</first><last>Shi</last></author>
    <author><first>Frances</first><last>Yung</last></author>
    <author><first>Raphael</first><last>Rubino</last></author>
    <author><first>Vera</first><last>Demberg</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>484&#8211;495</pages>
    <url>http://www.aclweb.org/anthology/I17-1049</url>
    <abstract>Implicit discourse relation recognition is an extremely challenging task due to
	the lack of indicative connectives. Various neural network architectures have
	been proposed for this task recently, but most of them suffer from the shortage
	of labeled data. In this paper, we address this problem by procuring additional
	training data from parallel corpora: When humans translate a text, they
	sometimes add connectives (a process known as ėxtitexplicitation). We
	automatically back-translate it into an English connective and use it to infer
	a label with high confidence. We show that a training set several times larger
	than the original training set can be generated this way. With the extra
	labeled instances, we show that even a simple bidirectional Long Short-Term
	Memory Network can outperform the current state-of-the-art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shi-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1050">
    <title>Tag-Enhanced Tree-Structured Neural Networks for Implicit Discourse Relation Classification</title>
    <author><first>Yizhong</first><last>Wang</last></author>
    <author><first>Sujian</first><last>Li</last></author>
    <author><first>Jingfeng</first><last>Yang</last></author>
    <author><first>Xu</first><last>Sun</last></author>
    <author><first>Houfeng</first><last>Wang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>496&#8211;505</pages>
    <url>http://www.aclweb.org/anthology/I17-1050</url>
    <abstract>Identifying implicit discourse relations between text spans is a challenging
	task because it requires understanding the meaning of the text. To tackle this
	task, recent studies have tried several deep learning methods but few of them
	exploited the syntactic information. In this work, we explore the idea of
	incorporating syntactic parse tree into neural networks. Specifically, we
	employ the Tree-LSTM model and Tree-GRU model, which is based on the tree
	structure, to encode the arguments in a relation. And we further leverage the
	constituent tags to control the semantic composition process in these
	tree-structured neural networks. Experimental results show that our method
	achieves state-of-the-art performance on PDTB corpus.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-EtAl:2017:I17-12</bibkey>
  </paper>

  <paper id="1051">
    <title>Cross-Lingual Sentiment Analysis Without (Good) Translation</title>
    <author><first>Mohamed</first><last>Abdalla</last></author>
    <author><first>Graeme</first><last>Hirst</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>506&#8211;515</pages>
    <url>http://www.aclweb.org/anthology/I17-1051</url>
    <abstract>Current approaches to cross-lingual sentiment analysis try to leverage the
	wealth of labeled English data using bilingual lexicons, bilingual vector space
	embeddings, or machine translation systems. Here we show that it is possible to
	use a single linear transformation, with as few as 2000 word pairs, to capture
	fine-grained sentiment relationships between words in a cross-lingual setting.
	We apply these cross-lingual sentiment models to a diverse set of tasks to
	demonstrate their functionality in a non-English context. By effectively
	leveraging English sentiment knowledge without the need for accurate
	translation, we can analyze and extract features from other languages with
	scarce data at a very low cost, thus making sentiment and related analyses for
	many languages inexpensive.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>abdalla-hirst:2017:I17-1</bibkey>
  </paper>

  <paper id="1052">
    <title>Implicit Syntactic Features for Target-dependent Sentiment Analysis</title>
    <author><first>Yuze</first><last>Gao</last></author>
    <author><first>Yue</first><last>Zhang</last></author>
    <author><first>Tong</first><last>Xiao</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>516&#8211;524</pages>
    <url>http://www.aclweb.org/anthology/I17-1052</url>
    <abstract>Targeted sentiment analysis investigates the sentiment polarities on given
	target mentions from input texts. Different from sentence level sentiment, it
	offers more fine-grained knowledge on each entity mention. While early work
	leveraged syntactic information, recent research has used neural representation
	learning to induce features automatically, thereby avoiding error propagation
	of syntactic parsers, which are particularly severe on social media texts.
	We study a method to leverage syntactic information without explicitly building
	the parser outputs, by training an encoder-decoder structure parser model on
	standard syntactic treebanks, and then leveraging its hidden encoder layers
	when analysing tweets. Such hidden vectors do not contain explicit syntactic
	outputs, yet encode rich syntactic features. We use them to augment the inputs
	to a baseline state-of-the-art targeted sentiment classifier, observing
	significant improvements on various benchmark datasets. We obtain the best
	accuracies on all test sets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gao-zhang-xiao:2017:I17-1</bibkey>
  </paper>

  <paper id="1053">
    <title>Graph Based Sentiment Aggregation using ConceptNet Ontology</title>
    <author><first>Srikanth</first><last>Tamilselvam</last></author>
    <author><first>Seema</first><last>Nagar</last></author>
    <author><first>Abhijit</first><last>Mishra</last></author>
    <author><first>Kuntal</first><last>Dey</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>525&#8211;535</pages>
    <url>http://www.aclweb.org/anthology/I17-1053</url>
    <abstract>The sentiment aggregation problem accounts for analyzing the sentiment of a
	user towards various aspects/features of a product, and meaningfully
	assimilating the pragmatic significance of these features/aspects from an
	opinionated text. The current paper addresses the sentiment aggregation
	problem, by assigning weights to each aspect appearing in the user-generated
	content, that are proportionate to the strategic importance of the aspect in
	the pragmatic domain. The novelty of this paper is in computing the pragmatic
	significance (weight) of each aspect, using graph centrality measures (applied
	on domain specific ontology-graphs extracted from ConceptNet), and deeply
	ingraining these weights while aggregating the sentiments from opinionated
	text. We experiment over multiple real-life product review data. Our system
	consistently outperforms the state of the art - by as much as a F-score of
	20.39% in one case.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tamilselvam-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1054">
    <title>Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification</title>
    <author><first>Huy-Thanh</first><last>Nguyen</last></author>
    <author><first>Minh-Le</first><last>Nguyen</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>536&#8211;544</pages>
    <url>http://www.aclweb.org/anthology/I17-1054</url>
    <abstract>Tweet-level sentiment classification in Twitter social networking has many
	challenges: exploiting syntax, semantic, sentiment, and context in tweets. To
	address these problems, we propose a novel approach to sentiment analysis that
	uses lexicon features for building lexicon embeddings (LexW2Vs) and generates
	character attention vectors (CharAVs) by using a Deep Convolutional Neural
	Network (DeepCNN). Our approach integrates LexW2Vs and CharAVs with continuous
	word embeddings (ContinuousW2Vs) and dependency-based word embeddings
	(DependencyW2Vs) simultaneously in order to increase information for each word
	into a Bidirectional Contextual Gated Recurrent Neural Network (Bi-CGRNN). We
	evaluate our model on two Twitter sentiment classification datasets.
	Experimental results show that our model can improve the classification
	accuracy of sentence-level sentiment analysis in Twitter social networking.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nguyen-nguyen:2017:I17-1</bibkey>
  </paper>

  <paper id="1055">
    <title>Combining Lightly-Supervised Text Classification Models for Accurate Contextual Advertising</title>
    <author><first>Yiping</first><last>Jin</last></author>
    <author><first>Dittaya</first><last>Wanvarie</last></author>
    <author><first>Phu</first><last>Le</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>545&#8211;554</pages>
    <url>http://www.aclweb.org/anthology/I17-1055</url>
    <abstract>In this paper we propose a lightly-supervised framework to rapidly build text
	classifiers for contextual advertising. Traditionally text classification
	techniques require labeled training documents for each predefined class. In the
	scenario of contextual advertising, advertisers often want to target to a
	specific class of webpages most relevant to their product or service, which may
	not be covered by a pre-trained classifier. Moreover, the advertisers are
	interested in whether a webpage is &#x201c;relevant&#x201d; or &#x201c;irrelevant&#x201d;. It is
	time-consuming to solicit the advertisers for reliable training signals for the
	negative class. Therefore, it is more suitable to model the problem as a
	one-class classification problem, in contrast to traditional classification
	problems where disjoint classes are defined a priori.
	We first apply two state-of-the-art lightly-supervised classification models,
	generalized expectation (GE) criteria (Druck et al., 2008) and multinomial
	naive Bayes (MNB) with priors (Settles, 2011) to one-class classification where
	the user only needs to provide a small list of labeled words for the target
	class. To combine the strengths of the two models, we fuse them together by
	using MNB to automatically enrich the constraints for GE training. We also
	explore ensemble method to combine classifiers. On a corpus of webpages from
	real-time bidding requests, the proposed GE1 MNB1 model achieves the highest
	average F1 of 0.69 and closes more than half of the gap between previous
	stateof- the-art lightly-supervised models to a fully-supervised MaxEnt model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jin-wanvarie-le:2017:I17-1</bibkey>
  </paper>

  <paper id="1056">
    <title>Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields</title>
    <author><first>Fei</first><last>Liu</last></author>
    <author><first>Timothy</first><last>Baldwin</last></author>
    <author><first>Trevor</first><last>Cohn</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>555&#8211;565</pages>
    <url>http://www.aclweb.org/anthology/I17-1056</url>
    <abstract>Despite successful applications across a broad range of NLP tasks, conditional
	random fields (&#x201c;CRFs&#x201d;), in particular the linear-chain variant, are only able
	to model local features.
	  While this has important benefits in terms of inference tractability, it
	limits the ability of the model to capture long-range dependencies between
	items.
	  Attempts to extend CRFs to capture long-range dependencies have largely come
	at the cost of computational complexity and approximate inference.
	  In this work, we propose an extension to CRFs by integrating external memory,
	taking inspiration from memory networks, thereby allowing CRFs to incorporate
	information far beyond neighbouring steps.
	  Experiments across two tasks show substantial improvements over strong CRF
	and LSTM baselines.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>liu-baldwin-cohn:2017:I17-1</bibkey>
  </paper>

  <paper id="1057">
    <title>Named Entity Recognition with Stack Residual LSTM and Trainable Bias Decoding</title>
    <author><first>Quan</first><last>Tran</last></author>
    <author><first>Andrew</first><last>MacKinlay</last></author>
    <author><first>Antonio</first><last>Jimeno Yepes</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>566&#8211;575</pages>
    <url>http://www.aclweb.org/anthology/I17-1057</url>
    <abstract>},
  url       = {http://www.aclweb.org/anthology/I17-1057}
}
</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tran-mackinlay-jimenoyepes:2017:I17-1</bibkey>
  </paper>

  <paper id="1058">
    <title>Neuramanteau: A Neural Network Ensemble Model for Lexical Blends</title>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>576&#8211;583</pages>
    <url>http://www.aclweb.org/anthology/I17-1058</url>
    <attachment type="note">I17-1058.Notes.pdf</attachment>
    <abstract>The problem of blend formation in generative linguistics is interesting in the
	context of neologism, their quick adoption in modern life and the creative
	generative process guiding their formation. Blend quality depends on multitude
	of factors with high degrees of uncertainty. In this work, we investigate if
	the modern neural network models can sufficiently capture and recognize the
	creative blend composition process. We propose recurrent neural network
	sequence-to-sequence models, that are evaluated on multiple blend datasets
	available in the literature. We propose an ensemble neural and hybrid model
	that outperforms most of the baselines and heuristic models upon evaluation on
	test data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>das-ghosh:2017:I17-1</bibkey>
  </paper>

  <paper id="1059">
    <title>Leveraging Discourse Information Effectively for Authorship Attribution</title>
    <author><first>Elisa</first><last>Ferracane</last></author>
    <author><first>Su</first><last>Wang</last></author>
    <author><first>Raymond</first><last>Mooney</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>584&#8211;593</pages>
    <url>http://www.aclweb.org/anthology/I17-1059</url>
    <abstract>We explore techniques to maximize the effectiveness of discourse information in
	the task of authorship attribution. We present a novel method to embed
	discourse features in a Convolutional Neural Network text classifier, which
	achieves a state-of-the-art result by a significant margin. We empirically
	investigate several featurization methods to understand the conditions under
	which discourse features contribute non-trivial performance gains, and analyze
	discourse embeddings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ferracane-wang-mooney:2017:I17-1</bibkey>
  </paper>

  <paper id="1060">
    <title>Lightly-Supervised Modeling of Argument Persuasiveness</title>
    <author><first>Isaac</first><last>Persing</last></author>
    <author><first>Vincent</first><last>Ng</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>594&#8211;604</pages>
    <url>http://www.aclweb.org/anthology/I17-1060</url>
    <abstract>We propose the first lightly-supervised approach to scoring an argument's
	persuasiveness. Key to our approach is the novel hypothesis that
	lightly-supervised persuasiveness scoring is possible by explicitly modeling
	the major errors that negatively impact persuasiveness. In an evaluation on a
	new annotated corpus of online debate arguments, our approach rivals its
	fully-supervised counterparts in performance by four scoring metrics when using
	only 10% of the available training instances.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>persing-ng:2017:I17-1</bibkey>
  </paper>

  <paper id="1061">
    <title>Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models</title>
    <author><first>Yi</first><last>Luan</last></author>
    <author><first>Chris</first><last>Brockett</last></author>
    <author><first>Bill</first><last>Dolan</last></author>
    <author><first>Jianfeng</first><last>Gao</last></author>
    <author><first>Michel</first><last>Galley</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>605&#8211;614</pages>
    <url>http://www.aclweb.org/anthology/I17-1061</url>
    <abstract>Building a persona-based conversation agent is challenging owing to the lack of
	large amounts of speaker-specific conversation data for model training. This
	paper addresses the problem by proposing a multi-task learning approach to
	training neural conversation models that leverages both conversation data
	across speakers and other types of data pertaining to the speaker and speaker
	roles to be modeled. Experiments show that our approach leads to significant
	improvements over baseline model quality, generating responses that capture
	more precisely speakers’ traits and speaking styles. The model offers the
	benefits of being algorithmically simple and easy to implement, and not relying
	on large quantities of data representing specific individual speakers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>luan-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1062">
    <title>Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks</title>
    <author><first>Shikib</first><last>Mehri</last></author>
    <author><first>Giuseppe</first><last>Carenini</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>615&#8211;623</pages>
    <url>http://www.aclweb.org/anthology/I17-1062</url>
    <dataset>I17-1062.Datasets.tgz</dataset>
    <abstract>Thread disentanglement is a precursor to any high-level analysis of
	multiparticipant chats. Existing research approaches the problem by calculating
	the likelihood of two messages belonging in the same thread. Our approach
	leverages a newly annotated dataset to identify reply relationships.
	Furthermore, we explore the usage of an RNN, along with large quantities of
	unlabeled data, to learn semantic relationships between messages. Our proposed
	pipeline, which utilizes a reply classifier and an RNN to generate a set of
	disentangled threads, is novel and performs well against previous work.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mehri-carenini:2017:I17-1</bibkey>
  </paper>

  <paper id="1063">
    <title>Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features</title>
    <author><first>Marc</first><last>Schulder</last></author>
    <author><first>Michael</first><last>Wiegand</last></author>
    <author><first>Josef</first><last>Ruppenhofer</last></author>
    <author><first>Benjamin</first><last>Roth</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>624&#8211;633</pages>
    <url>http://www.aclweb.org/anthology/I17-1063</url>
    <abstract>We present a major step towards the creation of the first high-coverage lexicon
	of polarity shifters. In this work, we bootstrap a lexicon of verbs by
	exploiting various linguistic features. Polarity shifters, such as "abandon",
	are similar to negations (e.g. "not") in that they move the polarity of a
	phrase towards its inverse, as in "abandon all hope".
	While there exist lists of negation words, creating comprehensive lists of
	polarity shifters is far more challenging due to their sheer number. On a
	sample of manually annotated verbs we examine a variety of linguistic features
	for this task. Then we build a supervised classifier to increase coverage. 
	We show that this approach drastically reduces the annotation effort while
	ensuring a high-precision lexicon. We also show that our acquired knowledge of
	verbal polarity shifters improves phrase-level sentiment analysis.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schulder-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1064">
    <title>Cascading Multiway Attentions for Document-level Sentiment Classification</title>
    <author><first>Dehong</first><last>Ma</last></author>
    <author><first>Sujian</first><last>Li</last></author>
    <author><first>Xiaodong</first><last>Zhang</last></author>
    <author><first>Houfeng</first><last>Wang</last></author>
    <author><first>Xu</first><last>Sun</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>634&#8211;643</pages>
    <url>http://www.aclweb.org/anthology/I17-1064</url>
    <abstract>Document-level sentiment classification aims to assign the user reviews a
	sentiment polarity. Previous methods either just utilized the document content
	without consideration of user and product information, or did not
	comprehensively consider what roles the three kinds of information play in text
	modeling. In this paper, to reasonably use all the information, we present the
	idea that  user, product and their combination can all influence the generation
	of attentions to words and sentences, when judging the sentiment of a document.
	With this idea, we propose a cascading multiway attention (CMA) model, where 
	multiple ways of using user and product information are cascaded to influence
	the generation of attentions on the word and sentence layers. Then, sentences
	and documents are well modeled by multiple representation vectors, which
	provide rich information for sentiment classification. Experiments on IMDB and
	Yelp datasets demonstrate the effectiveness of our model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ma-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1065">
    <title>An Ensemble Method with Sentiment Features and Clustering Support</title>
    <author><first>Nguyen</first><last>Huy Tien</last></author>
    <author><first>Nguyen</first><last>Minh Le</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>644&#8211;653</pages>
    <url>http://www.aclweb.org/anthology/I17-1065</url>
    <abstract>Deep learning models have recently been applied successfully in natural
	language processing, especially sentiment analysis. Each deep learning model
	has a particular advantage, but it is difficult to combine these advantages
	into one model, especially in the area of sentiment analysis. In our approach,
	Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) were
	utilized to learn sentiment-specific features in a freezing scheme. This
	scenario provides a novel and efficient way for integrating advantages of deep
	learning models. In addition, we also grouped documents into clusters by their
	similarity and applied the prediction score of Naive Bayes SVM (NBSVM) method
	to boost the classification accuracy of each group. The experiments show that
	our method achieves the state-of-the-art performance on two well-known
	datasets: IMDB large movie reviews for document level and Pang &#38; Lee movie
	reviews for sentence level.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>huytien-minhle:2017:I17-1</bibkey>
  </paper>

  <paper id="1066">
    <title>Leveraging Auxiliary Tasks for Document-Level Cross-Domain Sentiment Classification</title>
    <author><first>Jianfei</first><last>Yu</last></author>
    <author><first>Jing</first><last>Jiang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>654&#8211;663</pages>
    <url>http://www.aclweb.org/anthology/I17-1066</url>
    <abstract>In this paper, we study domain adaptation with a state-of-the-art hierarchical
	neural network for document-level sentiment classification. We first design a
	new auxiliary task based on sentiment scores of domain-independent words. We
	then propose two neural network architectures to respectively induce document
	embeddings and sentence embeddings that work well for different domains.
	When these document and sentence embeddings are used for sentiment
	classification, we find that with both pseudo and external sentiment lexicons,
	our proposed methods can perform similarly to or better than several highly
	competitive domain adaptation methods on a benchmark dataset of product
	reviews.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yu-jiang:2017:I17-1</bibkey>
  </paper>

  <paper id="1067">
    <title>Measuring Semantic Relations between Human Activities</title>
    <author><first>Steven</first><last>Wilson</last></author>
    <author><first>Rada</first><last>Mihalcea</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>664&#8211;673</pages>
    <url>http://www.aclweb.org/anthology/I17-1067</url>
    <abstract>The things people do in their daily lives can provide valuable insights into
	their personality, values, and interests. Unstructured text data on social
	media platforms are rich in behavioral content, and automated systems can be
	deployed to learn about human activity on a broad scale if these systems are
	able to reason about the content of interest. In order to aid in the evaluation
	of such systems, we introduce a new phrase-level semantic textual similarity
	dataset comprised of human activity phrases, providing a testbed for automated
	systems that analyze relationships between phrasal descriptions of people's
	actions. Our set of 1,000 pairs of activities is annotated by human judges
	across four relational dimensions including similarity, relatedness,
	motivational alignment, and perceived actor congruence. We evaluate a set of
	strong baselines for the task of generating scores that correlate highly with
	human ratings, and we introduce several new approaches to the phrase-level
	similarity task in the domain of human activities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wilson-mihalcea:2017:I17-1</bibkey>
  </paper>

  <paper id="1068">
    <title>Learning Transferable Representation for Bilingual Relation Extraction via Convolutional Neural Networks</title>
    <author><first>Bonan</first><last>Min</last></author>
    <author><first>Zhuolin</first><last>Jiang</last></author>
    <author><first>Marjorie</first><last>Freedman</last></author>
    <author><first>Ralph</first><last>Weischedel</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>674&#8211;684</pages>
    <url>http://www.aclweb.org/anthology/I17-1068</url>
    <abstract>Typically, relation extraction models are trained to extract instances of a
	relation ontology using only training data from a single language. However, the
	concepts represented by the relation ontology (e.g. ResidesIn, EmployeeOf) are
	language independent. The numbers of annotated examples available for a given
	ontology vary between languages. For example, there are far fewer annotated
	examples in Spanish and Japanese than English and Chinese. Furthermore, using
	only language-specific training data results in the need to manually annotate
	equivalently large amounts of training for each new language a system
	encounters. We propose a deep neural network to learn transferable,
	discriminative bilingual representation. Experiments on the ACE 2005
	multilingual training corpus demonstrate that the joint training process
	results in significant improvement in relation classification performance over
	the monolingual counterparts. The learnt representation is discriminative and
	transferable between languages. When using 10% (25K English words, or 30K
	Chinese characters) of the training data, our approach results in doubling F1
	compared to a monolingual baseline. We achieve comparable performance to the
	monolingual system trained with 250K English words (or 300K Chinese characters)
	With 50% of training data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>min-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1069">
    <title>Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora</title>
    <author><first>Amir</first><last>Hazem</last></author>
    <author><first>Emmanuel</first><last>Morin</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>685&#8211;693</pages>
    <url>http://www.aclweb.org/anthology/I17-1069</url>
    <abstract>Bilingual lexicon extraction from compa-
	rable corpora is constrained by the small
	amount                    of        available  data  when  dealing
	with specialized domains.  This aspect pe-
	nalizes the performance of distributional-
	based  approaches,  which  is  closely                    re-
	lated to the reliability of word’s cooccur-
	rence  counts  extracted  from                    comparable
	corpora. A solution to avoid this limitation
	is to associate external resources with the
	comparable corpus.  Since bilingual word
	embeddings have recently shown efficient
	models                    for  learning  bilingual            distributed
	representation                    of        words,        we              explore  dif-
	ferent word embedding models and show
	how a general-domain comparable corpus
	can enrich a specialized comparable cor-
	pus via neural networks</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hazem-morin:2017:I17-1</bibkey>
  </paper>

  <paper id="1070">
    <title>A Bambara Tonalization System for Word Sense Disambiguation Using Differential Coding, Segmentation and Edit Operation Filtering</title>
    <author><first>Luigi (Yu-Cheng)</first><last>Liu</last></author>
    <author><first>Damien</first><last>Nouvel</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>694&#8211;703</pages>
    <url>http://www.aclweb.org/anthology/I17-1070</url>
    <abstract>In many languages such as Bambara or Arabic, tone markers (diacritics) may be
	written but are actually often omitted. NLP applications are confronted to
	ambiguities and subsequent difficulties when processing texts. To circumvent
	this problem, tonalization may be used, as a word sense disambiguation task,
	relying on context to add diacritics that partially disambiguate words as well
	as senses. In this paper, we describe our implementation of a Bambara tonalizer
	that adds tone markers using machine learning (CRFs). To make our tool
	efficient, we used differential coding, word segmentation and edit operation
	filtering. We describe our approach that allows tractable machine learning and
	improves accuracy: our model may be learned within minutes on a 358K-word
	corpus and reaches 92.3% accuracy.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>liu-nouvel:2017:I17-1</bibkey>
  </paper>

  <paper id="1071">
    <title>Joint Learning of Dialog Act Segmentation and Recognition in Spoken Dialog Using Neural Networks</title>
    <author><first>Tianyu</first><last>Zhao</last></author>
    <author><first>Tatsuya</first><last>Kawahara</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>704&#8211;712</pages>
    <url>http://www.aclweb.org/anthology/I17-1071</url>
    <abstract>Dialog act segmentation and recognition are basic natural language
	understanding tasks in spoken dialog systems. This paper investigates a unified
	architecture for these two tasks, which aims to improve the model's performance
	on both of the tasks. Compared with past joint models, the proposed
	architecture can (1) incorporate contextual information in dialog act
	recognition, and (2) integrate models for tasks of different levels as a whole,
	i.e. dialog act segmentation on the word level and dialog act recognition on
	the segment level. Experimental results show that the joint training system
	outperforms the simple cascading system and the joint coding system on both
	dialog act segmentation and recognition tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhao-kawahara:2017:I17-1</bibkey>
  </paper>

  <paper id="1072">
    <title>Predicting Users' Negative Feedbacks in Multi-Turn Human-Computer Dialogues</title>
    <author><first>Xin</first><last>Wang</last></author>
    <author><first>Jianan</first><last>Wang</last></author>
    <author><first>Yuanchao</first><last>Liu</last></author>
    <author><first>Xiaolong</first><last>Wang</last></author>
    <author><first>Zhuoran</first><last>Wang</last></author>
    <author><first>Baoxun</first><last>Wang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>713&#8211;722</pages>
    <url>http://www.aclweb.org/anthology/I17-1072</url>
    <abstract>User experience is essential for human-computer dialogue systems. However, it
	is impractical to ask users to provide explicit feedbacks when the agents'
	responses displease them. Therefore, in this paper, we explore to predict
	users' imminent dissatisfactions caused by intelligent agents by analysing the
	existing utterances in the dialogue sessions. To our knowledge, this is the
	first work focusing on this task. Several possible factors that trigger
	negative emotions are modelled. A relation sequence model (RSM) is proposed to
	encode the sequence of appropriateness of current response with respect to the
	earlier utterances. The experimental results show that the proposed structure
	is effective in modelling emotional risk (possibility of negative feedback)
	than existing conversation modelling approaches. Besides, strategies of
	obtaining distance supervision data for pre-training are also discussed in this
	work. Balanced sampling with respect to the last response in the distance
	supervision data are shown to be reliable for data augmentation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-EtAl:2017:I17-13</bibkey>
  </paper>

  <paper id="1073">
    <title>Finding Dominant User Utterances And System Responses in Conversations</title>
    <author><first>Dhiraj</first><last>Madan</last></author>
    <author><first>Sachindra</first><last>Joshi</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>723&#8211;732</pages>
    <url>http://www.aclweb.org/anthology/I17-1073</url>
    <abstract>There are several dialog frameworks which allow manual specification of intents
	and rule based dialog flow. The rule based framework provides good control to
	dialog designers at the expense of being more time consuming and laborious. The
	job of a dialog designer can be reduced if we could identify pairs of user
	intents and corresponding responses automatically from prior conversations
	between users and agents. In this paper we propose an approach to find these
	frequent user utterances (which serve as examples for intents) and
	corresponding agent responses. We propose a novel SimCluster algorithm that
	extends standard K-means algorithm to simultaneously cluster user utterances
	and agent utterances by taking their adjacency information into account. The
	method also aligns these clusters to provide pairs of intents and response
	groups. We compare our results with those produced by using simple Kmeans
	clustering on a real dataset and observe upto 10% absolute improvement in
	F1-scores. Through our experiments on synthetic dataset, we show that our
	algorithm gains more advantage over K-means algorithm when the data has large
	variance.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>madan-joshi:2017:I17-1</bibkey>
  </paper>

  <paper id="1074">
    <title>End-to-End Task-Completion Neural Dialogue Systems</title>
    <author><first>Xiujun</first><last>Li</last></author>
    <author><first>Yun-Nung</first><last>Chen</last></author>
    <author><first>Lihong</first><last>Li</last></author>
    <author><first>Jianfeng</first><last>Gao</last></author>
    <author><first>Asli</first><last>Celikyilmaz</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>733&#8211;743</pages>
    <url>http://www.aclweb.org/anthology/I17-1074</url>
    <abstract>One of the major drawbacks of modularized task-completion dialogue systems is
	that each module is trained individually, which presents several challenges.
	For example, downstream modules are affected by earlier modules, and the
	performance of the entire system is not robust to the accumulated errors. This
	paper presents a novel end-to-end learning framework for task-completion
	dialogue systems to tackle such issues.Our neural dialogue system can directly
	interact with a structured database to assist users in accessing information
	and accomplishing certain tasks. The reinforcement learning based dialogue
	manager offers robust capabilities to handle noises caused by other components
	of the dialogue system. Our experiments in a movie-ticket booking domain show
	that our end-to-end system not only outperforms modularized dialogue system
	baselines for both objective and subjective evaluation, but also is robust to
	noises as demonstrated by several systematic experiments with different error
	granularity and rates specific to the language understanding module.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>li-EtAl:2017:I17-11</bibkey>
  </paper>

  <paper id="1075">
    <title>End-to-end Network for Twitter Geolocation Prediction and Hashing</title>
    <author><first>Jey Han</first><last>Lau</last></author>
    <author><first>Lianhua</first><last>Chi</last></author>
    <author><first>Khoi-Nguyen</first><last>Tran</last></author>
    <author><first>Trevor</first><last>Cohn</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>744&#8211;753</pages>
    <url>http://www.aclweb.org/anthology/I17-1075</url>
    <abstract>We propose an end-to-end neural network to predict the geolocation of a tweet.
	The network takes as input a number of raw Twitter metadata such as the tweet
	message and associated user account information. Our model is language
	independent, and despite minimal feature engineering, it is interpretable and
	capable of learning location indicative words and timing patterns. Compared to
	state-of-the-art systems, our model outperforms them by 2%-6%. Additionally, we
	propose extensions to the model to compress representation learnt by the
	network into binary codes. Experiments show that it produces compact codes
	compared to benchmark hashing algorithms. An implementation of the model is
	released publicly.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lau-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1076">
    <title>Assessing the Verifiability of Attributions in News Text</title>
    <author><first>Edward</first><last>Newell</last></author>
    <author><first>Ariane</first><last>Schang</last></author>
    <author><first>Drew</first><last>Margolin</last></author>
    <author><first>Derek</first><last>Ruths</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>754&#8211;763</pages>
    <url>http://www.aclweb.org/anthology/I17-1076</url>
    <dataset>I17-1076.Datasets.zip</dataset>
    <abstract>When reporting the news, journalists rely on the statements of stakeholders,
	experts, and officials.  The attribution of such a statement is verifiable if
	its fidelity to the source can be confirmed or denied. In this paper, we
	develop a new NLP task: determining the verifiability of an attribution based
	on linguistic cues.  We operationalize the notion of verifiability as a score
	between 0 and 1 using human judgments in a comparison-based approach.  Using
	crowdsourcing, we create a dataset of verifiability-scored attributions, and
	demonstrate a model that achieves an RMSE of 0.057 and Spearman's rank
	correlation of 0.95 to human-generated scores. We discuss the application of
	this technique to the analysis of mass media.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>newell-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1077">
    <title>Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions</title>
    <author><first>Daniel</first><last>Rieman</last></author>
    <author><first>Kokil</first><last>Jaidka</last></author>
    <author><first>H. Andrew</first><last>Schwartz</last></author>
    <author><first>Lyle</first><last>Ungar</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>764&#8211;773</pages>
    <url>http://www.aclweb.org/anthology/I17-1077</url>
    <abstract>Several studies have demonstrated how language models of user attributes, such
	as personality, can be built by using the Facebook language of social media
	users in conjunction with their responses to psychology questionnaires. It is
	challenging to apply these models to make general predictions about attributes
	of communities, such as personality distributions across US counties, because
	it requires 1. the potentially inavailability of the original training data
	because of privacy and ethical regulations, 2. adapting Facebook language
	models to Twitter language without retraining the model, and 3. adapting from
	users to county-level collections of tweets. We propose a two-step algorithm,
	Target Side Domain Adaptation (TSDA) for such domain adaptation when no labeled
	Twitter/county data is available. TSDA corrects for the different word
	distributions between Facebook and Twitter and for the varying word
	distributions across counties by adjusting target side word frequencies; no
	changes to the trained model are made. In the case of predicting the Big Five
	county-level personality traits, TSDA outperforms a state-of-the-art domain
	adaptation method, gives county-level predictions that have fewer extreme
	outliers, higher year-to-year stability, and higher correlation with
	county-level outcomes.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rieman-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1078">
    <title>Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach</title>
    <author><first>Lei</first><last>Gao</last></author>
    <author><first>Alexis</first><last>Kuppersmith</last></author>
    <author><first>Ruihong</first><last>Huang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>774&#8211;782</pages>
    <url>http://www.aclweb.org/anthology/I17-1078</url>
    <abstract>In the wake of a polarizing election, social media is laden with hateful
	content. To address various limitations of supervised hate speech
	classification methods including corpus bias and huge cost of annotation, we
	propose a weakly supervised two-path bootstrapping approach for an online hate
	speech detection model leveraging large-scale unlabeled data. This system
	significantly outperforms hate speech detection systems that are trained in a
	supervised manner using manually annotated data. Applying this model on a large
	quantity of tweets collected before, after, and on election day reveals
	motivations and patterns of inflammatory language.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gao-kuppersmith-huang:2017:I17-1</bibkey>
  </paper>

  <paper id="1079">
    <title>Estimating Reactions and Recommending Products with Generative Models of Reviews</title>
    <author><first>Jianmo</first><last>Ni</last></author>
    <author><first>Zachary C.</first><last>Lipton</last></author>
    <author><first>Sharad</first><last>Vikram</last></author>
    <author><first>Julian</first><last>McAuley</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>783&#8211;791</pages>
    <url>http://www.aclweb.org/anthology/I17-1079</url>
    <abstract>Traditional approaches to recommendation focus on learning from large volumes
	of historical feedback to estimate simple numerical quantities (Will a user
	click on a product? Make a purchase? etc.). Natural language approaches that
	model information like product reviews have proved to be incredibly useful in
	improving the performance of such methods, as reviews provide valuable
	auxiliary information that can be used to better estimate latent user
	preferences and item properties.
	In this paper, rather than using reviews as an inputs to a recommender system,
	we focus on generating reviews as the model's output. This requires us to
	efficiently model text (at the character level) to capture the preferences of
	the user, the properties of the item being consumed, and the interaction
	between them (i.e., the user's preference). We show that this can model can be
	used to (a) generate plausible reviews and estimate nuanced reactions; (b)
	provide personalized rankings of existing reviews; and (c) recommend existing
	products more effectively.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ni-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1080">
    <title>Summarizing Lengthy Questions</title>
    <author><first>Tatsuya</first><last>Ishigaki</last></author>
    <author><first>Hiroya</first><last>Takamura</last></author>
    <author><first>Manabu</first><last>Okumura</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>792&#8211;800</pages>
    <url>http://www.aclweb.org/anthology/I17-1080</url>
    <abstract>In this research, we propose the task of question summarization.
	We first analyzed question-summary pairs extracted from a Community Question
	Answering (CQA) site, and found that a proportion of questions cannot be
	summarized by extractive approaches but requires abstractive approaches.
	We created a dataset by regarding the question-title pairs posted on the CQA
	site as question-summary pairs.
	By using the data, we trained extractive and abstractive summarization models,
	and compared them based on ROUGE scores and manual evaluations.
	Our experimental results show an abstractive method using an encoder-decoder
	model with a copying mechanism achieves better scores for both ROUGE-2
	F-measure and the evaluations by human judges.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ishigaki-takamura-okumura:2017:I17-1</bibkey>
  </paper>

  <paper id="1081">
    <title>Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization</title>
    <author><first>Tobias</first><last>Falke</last></author>
    <author><first>Christian M.</first><last>Meyer</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>801&#8211;811</pages>
    <url>http://www.aclweb.org/anthology/I17-1081</url>
    <abstract>Concept-map-based multi-document summarization is a variant of traditional
	summarization that produces structured summaries in the form of concept maps.
	In this work, we propose a new model for the task that addresses several issues
	in previous methods. It learns to identify and merge coreferent concepts to
	reduce redundancy, determines their importance with a strong supervised model
	and finds an optimal summary concept map via integer linear programming. It is
	also computationally more efficient than previous methods, allowing us to
	summarize larger document sets. We evaluate the model on two datasets, finding
	that it outperforms several approaches from previous work.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>falke-meyer-gurevych:2017:I17-1</bibkey>
  </paper>

  <paper id="1082">
    <title>Abstractive Multi-document Summarization by Partial Tree Extraction, Recombination and Linearization</title>
    <author><first>Litton</first><last>J Kurisinkel</last></author>
    <author><first>Yue</first><last>Zhang</last></author>
    <author><first>Vasudeva</first><last>Varma</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>812&#8211;821</pages>
    <url>http://www.aclweb.org/anthology/I17-1082</url>
    <abstract>Existing work for abstractive multidocument summarization utilise existing
	phrase structures directly extracted from input documents to generate summary
	sentences. These methods can suffer from lack of consistence and coherence in
	merging phrases. We introduce a novel approach for abstractive multidocument
	summarization through partial dependency tree extraction, recombination and
	linearization. The method entrusts the summarizer to generate its own topically
	coherent sequential structures from scratch for effective communication.
	Results on TAC 2011, DUC-2004 and 2005 show that our system gives competitive
	results compared with state of the art abstractive summarization approaches in
	the literature. We also achieve competitive results  in linguistic quality
	assessed by human evaluators.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jkurisinkel-zhang-varma:2017:I17-1</bibkey>
  </paper>

  <paper id="1083">
    <title>Event Argument Identification on Dependency Graphs with Bidirectional LSTMs</title>
    <author><first>Alex</first><last>Judea</last></author>
    <author><first>Michael</first><last>Strube</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>822&#8211;831</pages>
    <url>http://www.aclweb.org/anthology/I17-1083</url>
    <abstract>In this paper we investigate the performance of event argument identification.
	We show that the performance is tied to syntactic complexity. 
	Based on this finding, we propose a novel and effective system for event
	argument identification. Recurrent Neural Networks learn to produce meaningful
	representations of long and short dependency paths. Convolutional Neural
	Networks learn to decompose the lexical context of argument candidates. They
	are combined into a simple system which outperforms a feature-based,
	state-of-the-art event argument identifier without any manual feature
	engineering.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>judea-strube:2017:I17-1</bibkey>
  </paper>

  <paper id="1084">
    <title>Selective Decoding for Cross-lingual Open Information Extraction</title>
    <author><first>Sheng</first><last>Zhang</last></author>
    <author><first>Kevin</first><last>Duh</last></author>
    <author><first>Benjamin</first><last>Van Durme</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>832&#8211;842</pages>
    <url>http://www.aclweb.org/anthology/I17-1084</url>
    <abstract>Cross-lingual open information extraction is the task of distilling facts from
	the source language into representations in the target language. We propose a
	novel encoder-decoder model for this problem. It employs a novel selective
	decoding mechanism, which explicitly models the sequence labeling process as
	well as the sequence generation process on the decoder side. Compared to a
	standard encoder-decoder model, selective decoding significantly increases the
	performance on a Chinese-English cross-lingual open IE dataset by 3.87-4.49
	BLEU and 1.91-5.92 F1. We also extend our approach to low-resource scenarios,
	and gain promising improvement.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-duh-vandurme:2017:I17-1</bibkey>
  </paper>

  <paper id="1085">
    <title>Event Ordering with a Generalized Model for Sieve Prediction Ranking</title>
    <author><first>Bill</first><last>McDowell</last></author>
    <author><first>Nathanael</first><last>Chambers</last></author>
    <author><first>Alexander</first><last>Ororbia II</last></author>
    <author><first>David</first><last>Reitter</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>843&#8211;853</pages>
    <url>http://www.aclweb.org/anthology/I17-1085</url>
    <abstract>This paper improves on several aspects of
	a sieve-based event ordering architecture,
	CAEVO (Chambers et al., 2014), which
	creates globally consistent temporal relations
	between events and time expressions.
	First, we examine the usage of word embeddings
	and semantic role features. With
	the incorporation of these new features, we
	demonstrate a 5% relative F1 gain over our
	replicated version of CAEVO. Second, we
	reformulate the architecture’s sieve-based
	inference algorithm as a prediction reranking
	method that approximately optimizes a
	scoring function computed using classifier
	precisions. Within this prediction reranking
	framework, we propose an alternative
	scoring function, showing an 8.8% relative
	gain over the original CAEVO. We further
	include an in-depth analysis of one of
	the main datasets that is used to evaluate
	temporal classifiers, and we show how despite
	using the densest corpus, there is still
	a danger of overfitting. While this paper
	focuses on temporal ordering, its results
	are applicable to other areas that use sieve-based
	architectures.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mcdowell-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1086">
    <title>Open Relation Extraction and Grounding</title>
    <author><first>Dian</first><last>Yu</last></author>
    <author><first>Lifu</first><last>Huang</last></author>
    <author><first>Heng</first><last>Ji</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>854&#8211;864</pages>
    <url>http://www.aclweb.org/anthology/I17-1086</url>
    <abstract>Previous open Relation Extraction (open RE) approaches mainly rely on
	linguistic patterns and constraints to extract important relational triples
	from large-scale corpora. However, they lack of abilities to cover diverse
	relation expressions or measure the relative importance of candidate triples
	within a sentence. It is also challenging to name the relation type of a
	relational triple merely based on context words, which could limit the
	usefulness of open RE in downstream applications. We propose a novel
	importance-based open RE approach by exploiting the global structure of a
	dependency tree to extract salient triples. We design an unsupervised relation
	type naming method by grounding relational triples to a large-scale Knowledge
	Base (KB) schema, leveraging KB triples and weighted context words associated
	with relational triples. Experiments on the English Slot Filling 2013 dataset
	demonstrate that our approach achieves 8.1% higher F-score over
	state-of-the-art open RE methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yu-huang-ji:2017:I17-1</bibkey>
  </paper>

  <paper id="1087">
    <title>Extraction of Gene-Environment Interaction from the Biomedical Literature</title>
    <author><first>Jinseon</first><last>You</last></author>
    <author><first>Jin-Woo</first><last>Chung</last></author>
    <author><first>Wonsuk</first><last>Yang</last></author>
    <author><first>Jong C.</first><last>Park</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>865&#8211;874</pages>
    <url>http://www.aclweb.org/anthology/I17-1087</url>
    <abstract>Genetic information in the literature has been extensively looked into for the
	purpose of discovering the etiology of a disease. As the gene-disease relation
	is sensitive to external factors, their identification is important to study a
	disease. Environmental influences, which are usually called Gene-Environment
	interaction (GxE), have been considered as important factors and have
	extensively been researched in biology. Nevertheless, there is still a lack of
	systems for automatic GxE extraction from the biomedical literature due to new
	challenges: (1) there are no preprocessing tools and corpora for GxE, (2)
	expressions of GxE are often quite implicit, and (3) document-level
	comprehension is usually required. We propose to overcome these challenges with
	neural network models and show that a modified sequence-to-sequence model with
	a static RNN decoder produces a good performance in GxE recognition.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>you-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1088">
    <title>Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation</title>
    <author><first>Liangming</first><last>Pan</last></author>
    <author><first>Xiaochen</first><last>Wang</last></author>
    <author><first>Chengjiang</first><last>Li</last></author>
    <author><first>Juanzi</first><last>Li</last></author>
    <author><first>Jie</first><last>Tang</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>875&#8211;884</pages>
    <url>http://www.aclweb.org/anthology/I17-1088</url>
    <abstract>Massive Open Online Courses (MOOCs), offering a new way to study online, are
	revolutionizing education. One challenging issue in MOOCs is how to design
	effective and fine-grained course concepts such that students with different
	backgrounds can grasp the essence of the course. In this paper, we conduct a
	systematic investigation of the problem of course concept extraction for MOOCs.
	We propose to learn latent representations for candidate concepts via an
	embedding-based method. Moreover, we develop a graph-based propagation
	algorithm to rank the candidate concepts based on the learned representations.
	We evaluate the proposed method using different courses from XuetangX and
	Coursera. Experimental results show that our method significantly outperforms
	all the alternative methods (+0.013-0.318 in terms of  R-precision; p<<0.01,
	t-test).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pan-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1089">
    <title>Identity Deception Detection</title>
    <author><first>Ver&#243;nica</first><last>P&#233;rez-Rosas</last></author>
    <author><first>Quincy</first><last>Davenport</last></author>
    <author><first>Anna Mengdan</first><last>Dai</last></author>
    <author><first>Mohamed</first><last>Abouelenien</last></author>
    <author><first>Rada</first><last>Mihalcea</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>885&#8211;894</pages>
    <url>http://www.aclweb.org/anthology/I17-1089</url>
    <abstract>This paper addresses the task of detecting identity deception in language.
	Using a novel identity deception dataset, consisting of real and portrayed
	identities from 600 individuals, we show that we can build accurate identity
	detectors targeting both age and gender, with accuracies of up to 88. We also
	perform an analysis of the linguistic patterns used in identity deception,
	which lead to interesting insights into identity portrayers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>perezrosas-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1090">
    <title>Learning to Diagnose: Assimilating Clinical Narratives using Deep Reinforcement Learning</title>
    <author><first>Yuan</first><last>Ling</last></author>
    <author><first>Sadid A.</first><last>Hasan</last></author>
    <author><first>Vivek</first><last>Datla</last></author>
    <author><first>Ashequl</first><last>Qadir</last></author>
    <author><first>Kathy</first><last>Lee</last></author>
    <author><first>Joey</first><last>Liu</last></author>
    <author><first>Oladimeji</first><last>Farri</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>895&#8211;905</pages>
    <url>http://www.aclweb.org/anthology/I17-1090</url>
    <abstract>Clinical diagnosis is a critical and non-trivial aspect of patient care which
	often requires significant medical research and investigation based on an
	underlying clinical scenario. This paper proposes a novel approach by
	formulating clinical diagnosis as a reinforcement learning problem. During
	training, the reinforcement learning agent mimics the clinician's cognitive
	process and learns the optimal policy to obtain the most appropriate diagnoses
	for a clinical narrative. This is achieved through an iterative search for
	candidate diagnoses from external knowledge sources via a sentence-by-sentence
	analysis of the inherent clinical context. A deep Q-network architecture is
	trained to optimize a reward function that measures the accuracy of the
	candidate diagnoses. Experiments on the TREC CDS datasets demonstrate the
	effectiveness of our system over various non-reinforcement learning-based
	systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ling-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1091">
    <title>Dataset for a Neural Natural Language Interface for Databases (NNLIDB)</title>
    <author><first>Florin</first><last>Brad</last></author>
    <author><first>Radu Cristian Alexandru</first><last>Iacob</last></author>
    <author><first>Ionel Alexandru</first><last>Hosu</last></author>
    <author><first>Traian</first><last>Rebedea</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>906&#8211;914</pages>
    <url>http://www.aclweb.org/anthology/I17-1091</url>
    <dataset>I17-1091.Datasets.zip</dataset>
    <abstract>Progress in natural language interfaces to databases (NLIDB) has been slow
	mainly due to linguistic issues (such as language ambiguity) and domain
	portability. Moreover, the lack of a large corpus to be used as a standard
	benchmark has made data-driven approaches difficult to develop and compare. In
	this paper, we revisit the problem of NLIDBs and recast it as a sequence
	translation problem. To this end, we introduce a large dataset extracted from
	the Stack Exchange Data Explorer website, which can be used for training neural
	natural language interfaces for databases. We also report encouraging baseline
	results on a smaller manually annotated test corpus, obtained using an
	attention-based sequence-to-sequence neural network.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>brad-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1092">
    <title>Acquisition and Assessment of Semantic Content for the Generation of Elaborateness and Indirectness in Spoken Dialogue Systems</title>
    <author><first>Louisa</first><last>Pragst</last></author>
    <author><first>Koichiro</first><last>Yoshino</last></author>
    <author><first>Wolfgang</first><last>Minker</last></author>
    <author><first>Satoshi</first><last>Nakamura</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>915&#8211;925</pages>
    <url>http://www.aclweb.org/anthology/I17-1092</url>
    <attachment type="note">I17-1092.Notes.pdf</attachment>
    <abstract>In a dialogue system, the dialogue manager selects one of several system
	actions and thereby determines the system's behaviour. Defining all possible
	system actions in a dialogue system by hand is a tedious work. While efforts
	have been made to automatically generate such system actions, those approaches
	are mostly focused on providing functional system behaviour. Adapting the
	system behaviour to the user becomes a difficult task due to the limited amount
	of system actions available. We aim to increase the adaptability of a dialogue
	system by automatically generating variants of system actions. In this work, we
	introduce an approach to automatically generate action variants for
	elaborateness and indirectness. Our proposed algorithm extracts RDF triplets
	from a knowledge base and rates their relevance to the original system action
	to find suitable content. We show that the results of our algorithm are mostly
	perceived similarly to human generated elaborateness and indirectness and can
	be used to adapt a conversation to the current user and situation. We also
	discuss where the results of our algorithm are still lacking and how this could
	be improved: Taking into account the conversation topic as well as the culture
	of the user is likely to have beneficial effect on the user's perception.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pragst-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1093">
    <title>Demographic Word Embeddings for Racism Detection on Twitter</title>
    <author><first>Mohammed</first><last>Hasanuzzaman</last></author>
    <author><first>Ga&#235;l</first><last>Dias</last></author>
    <author><first>Andy</first><last>Way</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>926&#8211;936</pages>
    <url>http://www.aclweb.org/anthology/I17-1093</url>
    <abstract>Most social media platforms grant users freedom of speech by allowing them to
	freely express their thoughts, beliefs, and opinions. Although this represents
	incredible and unique communication opportunities, it also presents important
	challenges. Online racism is such an example. In this study, we present a
	supervised learning strategy to detect racist language on Twitter based on word
	embedding that incorporate demographic (Age, Gender, and Location) information.
	Our methodology achieves reasonable classification accuracy over a gold
	standard dataset (F1=76.3%) and significantly improves over the classification
	performance of demographic-agnostic models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hasanuzzaman-dias-way:2017:I17-1</bibkey>
  </paper>

  <paper id="1094">
    <title>Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization</title>
    <author><first>Itsumi</first><last>Saito</last></author>
    <author><first>Kyosuke</first><last>Nishida</last></author>
    <author><first>Kugatsu</first><last>Sadamitsu</last></author>
    <author><first>Kuniko</first><last>Saito</last></author>
    <author><first>Junji</first><last>Tomita</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>937&#8211;946</pages>
    <url>http://www.aclweb.org/anthology/I17-1094</url>
    <abstract>Social media texts, such as tweets from Twitter, contain many types of
	non-standard tokens, and the number of normalization approaches for handling
	such noisy text has been increasing. We present a method for automatically
	extracting pairs of a variant word and its normal form from unsegmented text on
	the basis of a pair-wise similarity approach. We incorporated the acquired
	variant-normalization pairs into Japanese morphological analysis. The
	experimental results show that our method can extract widely covered variants
	from large Twitter data and improve the recall of normalization without
	degrading the overall accuracy of Japanese morphological analysis.
	Author{1}Affiliation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>saito-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1095">
    <title>Semantic Document Distance Measures and Unsupervised Document Revision Detection</title>
    <author><first>Xiaofeng</first><last>Zhu</last></author>
    <author><first>Diego</first><last>Klabjan</last></author>
    <author><first>Patrick</first><last>Bless</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>947&#8211;956</pages>
    <url>http://www.aclweb.org/anthology/I17-1095</url>
    <software>I17-1095.Software.txt</software>
    <dataset>I17-1095.Datasets.txt</dataset>
    <abstract>In this paper, we model the document revision detection problem as a minimum
	cost branching problem that relies on computing document distances.
	Furthermore, we propose two new document distance measures, word vector-based
	Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED).
	Our revision detection system is designed for a large scale corpus and
	implemented in Apache Spark. We demonstrate that our system can more precisely
	detect revisions than state-of-the-art methods by utilizing the Wikipedia
	revision dumps and simulated data sets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhu-klabjan-bless:2017:I17-1</bibkey>
  </paper>

  <paper id="1096">
    <title>An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks</title>
    <author><first>Yelong</first><last>Shen</last></author>
    <author><first>Xiaodong</first><last>Liu</last></author>
    <author><first>Kevin</first><last>Duh</last></author>
    <author><first>Jianfeng</first><last>Gao</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>957&#8211;966</pages>
    <url>http://www.aclweb.org/anthology/I17-1096</url>
    <abstract>Reading comprehension (RC) is a challenging task that requires synthesis of
	information across sentences and multiple turns of reasoning. 
	Using a state-of-the-art RC model, we empirically investigate the performance
	of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.
	The RC model is an end-to-end neural network with iterative attention, and uses
	reinforcement learning to dynamically control the number of turns. 
	We find that multiple-turn reasoning outperforms single-turn reasoning for all
	question and answer types; further, we observe that enabling a flexible number
	of turns generally improves upon a fixed multiple-turn strategy. 
	%across all question types, and is particularly beneficial to questions with
	lengthy, descriptive answers. 
	We achieve results competitive to the state-of-the-art on these two datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shen-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1097">
    <title>Automated Historical Fact-Checking by Passage Retrieval, Word Statistics, and Virtual Question-Answering</title>
    <author><first>Mio</first><last>Kobayashi</last></author>
    <author><first>Ai</first><last>Ishii</last></author>
    <author><first>Chikara</first><last>Hoshino</last></author>
    <author><first>Hiroshi</first><last>Miyashita</last></author>
    <author><first>Takuya</first><last>Matsuzaki</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>967&#8211;975</pages>
    <url>http://www.aclweb.org/anthology/I17-1097</url>
    <abstract>This paper presents a hybrid approach to the verification of statements about
	historical facts. The test data was collected from the world history
	examinations in a standardized achievement test for high school students. The
	data includes various kinds of false statements that were carefully written so
	as to deceive the students while they can be disproven on the basis of the
	teaching materials. Our system predicts the truth or falsehood of a statement
	based on text search, word cooccurrence statistics, factoid-style question
	answering, and temporal relation recognition. These features contribute to the
	judgement complementarily and achieved the state-of-the-art accuracy.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kobayashi-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1098">
    <title>Integrating Subject, Type, and Property Identification for Simple Question Answering over Knowledge Base</title>
    <author><first>Wei-Chuan</first><last>Hsiao</last></author>
    <author><first>Hen-Hsen</first><last>Huang</last></author>
    <author><first>Hsin-Hsi</first><last>Chen</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>976&#8211;985</pages>
    <url>http://www.aclweb.org/anthology/I17-1098</url>
    <abstract>This paper presents an approach to identify subject, type and property from
	knowledge base (KB) for answering simple questions. We propose new features to
	rank entity candidates in KB. Besides, we split a relation in KB into type and
	property. Each of them is modeled by a bi-directional LSTM. Experimental
	results show that our model achieves the state-of-the-art performance on the
	SimpleQuestions dataset. The hard questions in the experiments are also
	analyzed in detail.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hsiao-huang-chen:2017:I17-1</bibkey>
  </paper>

  <paper id="1099">
    <title>DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset</title>
    <author><first>Yanran</first><last>Li</last></author>
    <author><first>Hui</first><last>Su</last></author>
    <author><first>Xiaoyu</first><last>Shen</last></author>
    <author><first>Wenjie</first><last>Li</last></author>
    <author><first>Ziqiang</first><last>Cao</last></author>
    <author><first>Shuzi</first><last>Niu</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>986&#8211;995</pages>
    <url>http://www.aclweb.org/anthology/I17-1099</url>
    <dataset>I17-1099.Datasets.zip</dataset>
    <abstract>We develop a high-quality multi-turn dialog dataset, ėxtbfDailyDialog,
	which is intriguing in several aspects. The language is human-written and less
	noisy. The dialogues in the dataset reflect our daily communication way and
	cover various topics about our daily life. We also manually label the developed
	dataset with communication intention and emotion information. Then, we evaluate
	existing approaches on DailyDialog dataset and hope it benefit the research
	field of dialog systems. The dataset is available on
	http://yanran.li/dailydialog</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>li-EtAl:2017:I17-12</bibkey>
  </paper>

  <paper id="1100">
    <title>Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework</title>
    <author><first>Aaron Steven</first><last>White</last></author>
    <author><first>Pushpendre</first><last>Rastogi</last></author>
    <author><first>Kevin</first><last>Duh</last></author>
    <author><first>Benjamin</first><last>Van Durme</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>996&#8211;1005</pages>
    <url>http://www.aclweb.org/anthology/I17-1100</url>
    <abstract>We propose to unify a variety of existing semantic classification tasks, such
	as semantic role labeling, anaphora resolution, and paraphrase detection, under
	the heading of Recognizing Textual Entailment (RTE). We present a general
	strategy to automatically generate one or more sentential hypotheses based on
	an input sentence and pre-existing manual semantic annotations. The resulting
	suite of datasets enables us to probe a statistical RTE model's performance on
	different aspects of semantics. We demonstrate the value of this approach by
	investigating the behavior of a popular neural network RTE model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>white-EtAl:2017:I17-1</bibkey>
  </paper>

  <paper id="1101">
    <title>Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model</title>
    <author><first>Eva</first><last>D'hondt</last></author>
    <author><first>Cyril</first><last>Grouin</last></author>
    <author><first>Brigitte</first><last>Grau</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>1006&#8211;1014</pages>
    <url>http://www.aclweb.org/anthology/I17-1101</url>
    <abstract>In this paper we present a novel approach to the automatic correction of
	OCR-induced orthographic errors in a given text. While current systems depend
	heavily on large training corpora or external information, such as
	domain-specific lexicons or confidence scores from the OCR process, our system
	only requires a small amount of (relatively) clean training data from a
	representative corpus to learn a character-based statistical language model
	using Bidirectional Long Short-Term Memory Networks (biLSTMs). We demonstrate
	the versatility and adaptability of our system on different text corpora with
	varying degrees of textual noise, including a real-life OCR corpus in the
	medical domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dhondt-grouin-grau:2017:I17-1</bibkey>
  </paper>

  <paper id="1102">
    <title>Multilingual Hierarchical Attention Networks for Document Classification</title>
    <author><first>Nikolaos</first><last>Pappas</last></author>
    <author><first>Andrei</first><last>Popescu-Belis</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>1015&#8211;1025</pages>
    <url>http://www.aclweb.org/anthology/I17-1102</url>
    <abstract>Hierarchical attention networks have recently achieved remarkable performance
	for document classification in a given language.  However, when multilingual
	document collections are considered, training such models separately for each
	language entails linear parameter growth and lack of cross-language transfer.
	Learning a single multilingual model with fewer parameters is therefore a
	challenging but potentially beneficial objective. To this end, we propose
	multilingual hierarchical attention networks for learning document structures,
	with shared encoders and/or shared attention mechanisms across languages, using
	multi-task learning and an aligned semantic space as input.  We evaluate the
	proposed models on multilingual document classification with disjoint label
	sets, on a large dataset which we provide, with 600k news documents in 8
	languages, and 5k labels.  The multilingual models outperform monolingual ones
	in low-resource as well as full-resource settings, and use fewer parameters,
	thus confirming their computational efficiency and the utility of
	cross-language transfer.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pappas-popescubelis:2017:I17-1</bibkey>
  </paper>

  <paper id="1103">
    <title>Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior</title>
    <author><first>Keith</first><last>Maki</last></author>
    <author><first>Michael</first><last>Yoder</last></author>
    <author><first>Yohan</first><last>Jo</last></author>
    <author><first>Carolyn</first><last>Ros&#233;</last></author>
    <booktitle>Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</booktitle>
    <month>November</month>
    <year>2017</year>
    <address>Taipei, Taiwan</address>
    <publisher>Asian Federation of Natural Language Processing</publisher>
    <pages>1026&#8211;1035</pages>
    <url>http://www.aclweb.org/anthology/I17-1103</url>
    <abstract>In this work we investigate how role-based behavior profiles of a Wikipedia
	editor, considered against the backdrop of roles taken up by other editors in
	discussions, predict the success of the editor at achieving an impact on the
	associated article.
	We first contribute a new public dataset including a task predicting the
	success of Wikipedia editors involved in discussion, measured by an
	operationalization of the lasting impact of their edits in the article.
	We then propose a probabilistic graphical model that advances earlier work
	inducing latent discussion roles using the light supervision of success in the
	negotiation task.
	We evaluate the performance of the model and interpret findings of roles and
	group configurations that lead to certain outcomes on Wikipedia.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>maki-EtAl:2017:I17-1</bibkey>
  </paper>

</volume>

