<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="4100">
    <title>Proceedings of the First Workshop on Subword and Character Level Models in NLP</title>
    <editor>Manaal Faruqui</editor>
    <editor>Hinrich Schuetze</editor>
    <editor>Isabel Trancoso</editor>
    <editor>Yadollah Yaghoobzadeh</editor>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-41</url>
    <bibtype>book</bibtype>
    <bibkey>SCLeM:2017</bibkey>
  </paper>

  <paper id="4101">
    <title>Character and Subword-Based Word Representation for Neural Language Modeling Prediction</title>
    <author><first>Matthieu</first><last>Labeau</last></author>
    <author><first>Alexandre</first><last>Allauzen</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;13</pages>
    <url>http://www.aclweb.org/anthology/W17-4101</url>
    <abstract>Most of neural language models use different kinds of embeddings for
	  word prediction. While word embeddings can be associated to each
	  word in the vocabulary or derived from characters as well as
	  factored morphological decomposition, these word representations are
	  mainly used to parametrize the input, i.e. the context of
	  prediction.  This work investigates the effect of using subword
	  units (character and factored morphological decomposition) to build
	  output representations for neural language modeling. We present a
	  case study on Czech, a morphologically-rich language, experimenting
	  with different input and output representations.  When working with
	  the full training vocabulary, despite unstable training, our
	  experiments show that augmenting the output word representations
	  with character-based embeddings can significantly improve the
	  performance of the model. Moreover, reducing the size of the output
	  look-up table, to let the character-based embeddings represent rare
	  words, brings further improvement.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>labeau-allauzen:2017:SCLeM</bibkey>
  </paper>

  <paper id="4102">
    <title>Learning variable length units for SMT between related languages via Byte Pair Encoding</title>
    <author><first>Anoop</first><last>Kunchukuttan</last></author>
    <author><first>Pushpak</first><last>Bhattacharyya</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>14&#8211;24</pages>
    <url>http://www.aclweb.org/anthology/W17-4102</url>
    <abstract>We explore the use of segments learnt using Byte Pair Encoding (referred to as
	BPE units) as basic units for statistical machine translation between related
	languages and compare it with orthographic syllables, which are currently the
	best performing basic units for this translation task. BPE identifies the most
	frequent character sequences as basic units, while orthographic syllables are
	linguistically motivated pseudo-syllables. We show that BPE units modestly
	outperform orthographic syllables as units of translation, showing up to 11%
	increase in BLEU score. While orthographic syllables can be used only for
	languages whose writing systems use vowel representations, BPE is writing
	system independent and we show that BPE outperforms other units for non-vowel
	writing systems too. Our results are supported by extensive experimentation
	spanning multiple language families and writing systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kunchukuttan-bhattacharyya:2017:SCLeM</bibkey>
  </paper>

  <paper id="4103">
    <title>Character Based Pattern Mining for Neology Detection</title>
    <author><first>Ga&#235;l</first><last>Lejeune</last></author>
    <author><first>Emmanuel</first><last>Cartier</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>25&#8211;30</pages>
    <url>http://www.aclweb.org/anthology/W17-4103</url>
    <abstract>Detecting neologisms is essential in real-time natural language processing
	applications. Not only can it enable to follow the lexical evolution of
	languages, but it is also essential for updating  linguistic resources and
	parsers.
	 In this paper, neology detection is considered as a classification task where
	a system has to assess whether a given lexical item is an actual neologism or
	not.
	 We propose a combination of an unsupervised data mining technique and a
	supervised machine learning approach.
	 It is inspired by current researches in stylometry and on token-level and
	character-level patterns. 
	 We train and evaluate our system on a manually designed reference dataset in
	French and Russian.
	 We show that this approach is able to largely outperform state-of-the-art
	neology detection systems. Furthermore, character-level patterns exhibit good
	properties for multilingual extensions of the system.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lejeune-cartier:2017:SCLeM</bibkey>
  </paper>

  <paper id="4104">
    <title>Automated Word Stress Detection in Russian</title>
    <author><first>Maria</first><last>Ponomareva</last></author>
    <author><first>Kirill</first><last>Milintsevich</last></author>
    <author><first>Ekaterina</first><last>Chernyak</last></author>
    <author><first>Anatoly</first><last>Starostin</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>31&#8211;35</pages>
    <url>http://www.aclweb.org/anthology/W17-4104</url>
    <abstract>In this study we address the problem of automated word stress detection in
	Russian using character level models and no part-speech-taggers. We use a
	simple bidirectional RNN with LSTM nodes and achieve accuracy of 90% or
	higher. We experiment with two training datasets and show that using the data
	from an annotated corpus is much more efficient than using only a dictionary,
	since it allows to retain the context of the word and its morphological
	features.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ponomareva-EtAl:2017:SCLeM</bibkey>
  </paper>

  <paper id="4105">
    <title>A Syllable-based Technique for Word Embeddings of Korean Words</title>
    <author><first>Sanghyuk</first><last>Choi</last></author>
    <author><first>Taeuk</first><last>Kim</last></author>
    <author><first>Jinseok</first><last>Seol</last></author>
    <author><first>Sang-goo</first><last>Lee</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>36&#8211;40</pages>
    <url>http://www.aclweb.org/anthology/W17-4105</url>
    <abstract>Word embedding has become a fundamental component to many NLP tasks such as
	named entity recognition and machine translation. However, popular models that
	learn such embeddings are unaware of the morphology of words, so it is not
	directly applicable to highly agglutinative languages such as Korean. We
	propose a syllable-based learning model for Korean using a convolutional neural
	network, in which word representation is composed of trained syllable vectors.
	Our model successfully produces morphologically meaningful representation of
	Korean words compared to the original Skip-gram embeddings. The results also
	show that it is quite robust to the Out-of-Vocabulary problem.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>choi-EtAl:2017:SCLeM</bibkey>
  </paper>

  <paper id="4106">
    <title>Supersense Tagging with a Combination of Character, Subword, and Word-level Representations</title>
    <author><first>Youhyun</first><last>Shin</last></author>
    <author><first>Sang-goo</first><last>Lee</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>41&#8211;45</pages>
    <url>http://www.aclweb.org/anthology/W17-4106</url>
    <abstract>Recently, there has been increased interest in utilizing characters or subwords
	for natural language processing (NLP) tasks. However, the effect of utilizing
	character, subword, and word-level information simultaneously has not been
	examined so far. In this paper, we propose a model to leverage various levels
	of input features to improve on the performance of an supersense tagging task.
	Detailed analysis of experimental results show that different levels of input
	representation offer distinct characteristics that explain performance
	discrepancy among different tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shin-lee:2017:SCLeM</bibkey>
  </paper>

  <paper id="4107">
    <title>Weakly supervised learning of allomorphy</title>
    <author><first>Miikka</first><last>Silfverberg</last></author>
    <author><first>Mans</first><last>Hulden</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>46&#8211;56</pages>
    <url>http://www.aclweb.org/anthology/W17-4107</url>
    <abstract>Most NLP resources that offer annotations at the word segment level provide
	morphological annotation that includes features indicating tense, aspect,
	modality, gender, case, and other inflectional information.  Such information
	is rarely aligned to the relevant parts of the words&#8211;-i.e. the allomorphs, as
	such annotation would be very costly.  These unaligned weak labelings are
	commonly provided by annotated NLP corpora such as treebanks in various
	languages.  Although they lack alignment information, the presence/absence of
	labels at the word level is also consistent with the amount of supervision
	assumed to be provided to L1 and L2 learners. In this paper, we explore several
	methods to learn this latent alignment between parts of word forms and the
	grammatical information provided.  All the methods under investigation favor
	hypotheses regarding allomorphs of morphemes that re-use a small inventory,
	i.e. implicitly minimize the number of allomorphs that a morpheme can be
	realized as.  We show that the provided information offers a significant
	advantage for both word segmentation and the learning of allomorphy.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>silfverberg-hulden:2017:SCLeM</bibkey>
  </paper>

  <paper id="4108">
    <title>Character-based recurrent neural networks for morphological relational reasoning</title>
    <author><first>Olof</first><last>Mogren</last></author>
    <author><first>Richard</first><last>Johansson</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>57&#8211;63</pages>
    <url>http://www.aclweb.org/anthology/W17-4108</url>
    <abstract>We present a model for predicting word forms based on
	    morphological relational reasoning with analogies. While
	    previous work has explored tasks such as morphological inflection
	    and reinflection, these models rely on an explicit enumeration
	    of morphological features, which may not be available in all cases.
	    To address the task of predicting a word form given a demo
	      relation (a pair of word forms) and a query word, we
	    devise a character-based recurrent neural network architecture
	    using three separate encoders and a decoder.
	    We also investigate a multiclass learning setup, where the
	    prediction of the relation type label is used as an auxiliary task.
	    Our results show that the exact form can be predicted for
	    English with an accuracy of 94.7%. For Swedish, which has a more
	    complex morphology with more inflectional patterns for nouns and
	    verbs, the accuracy is 89.3%. We also show that using the
	    auxiliary task of learning the relation type speeds up convergence
	    and improves the prediction accuracy for the word generation task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mogren-johansson:2017:SCLeM</bibkey>
  </paper>

  <paper id="4109">
    <title>Glyph-aware Embedding of Chinese Characters</title>
    <author><first>Falcon</first><last>Dai</last></author>
    <author><first>Zheng</first><last>Cai</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>64&#8211;69</pages>
    <url>http://www.aclweb.org/anthology/W17-4109</url>
    <abstract>Given the advantage and recent success of English character-level and
	subword-unit models in several NLP tasks, we consider the equivalent modeling
	problem for Chinese. Chinese script is logographic and many Chinese logograms
	are composed of common substructures that provide semantic, phonetic and
	syntactic hints. In this work, we propose to explicitly incorporate the visual
	appearance of a character’s glyph in its representation, resulting in a novel
	glyph-aware embedding of Chinese characters. Being inspired by the success of
	convolutional neural networks in computer vision, we use them to incorporate
	the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In
	the context of two basic Chinese NLP tasks of language modeling and word
	segmentation, the model learns to represent each character’s task-relevant
	semantic and syntactic information in the character-level embedding.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dai-cai:2017:SCLeM</bibkey>
  </paper>

  <paper id="4110">
    <title>Exploring Cross-Lingual Transfer of Morphological Knowledge In Sequence-to-Sequence Models</title>
    <author><first>Huiming</first><last>Jin</last></author>
    <author><first>Katharina</first><last>Kann</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>70&#8211;75</pages>
    <url>http://www.aclweb.org/anthology/W17-4110</url>
    <abstract>Multi-task training is an effective method to mitigate the data sparsity
	problem. 
	It has recently been applied for cross-lingual transfer learning for paradigm
	completion&#8211;-the task of producing inflected forms of lemmata&#8211;-with
	sequence-to-sequence networks.
	However, it is still vague how the model transfers knowledge across languages,
	as well as if and which information is shared.
	To investigate this, we propose a set of data-dependent experiments using an
	existing 
	encoder-decoder recurrent neural network for the task. Our results show that 
	indeed the performance gains surpass a pure regularization effect and that
	knowledge about language and 
	morphology can be transferred.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jin-kann:2017:SCLeM</bibkey>
  </paper>

  <paper id="4111">
    <title>Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models</title>
    <author><first>Katharina</first><last>Kann</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>76&#8211;81</pages>
    <url>http://www.aclweb.org/anthology/W17-4111</url>
    <abstract>We present a semi-supervised way of training a character-based encoder-decoder
	recurrent neural network for morphological reinflection&#8211;-the task of
	generating one inflected wordform from another. This is achieved by using
	unlabeled tokens or random strings as training data for an autoencoding task,
	adapting a network for morphological reinflection, and performing multi-task
	training.
	We thus use limited labeled data more effectively, obtaining up to 9.92%
	improvement over state-of-the-art baselines for 8 different languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kann-schutze:2017:SCLeM</bibkey>
  </paper>

  <paper id="4112">
    <title>Vowel and Consonant Classification through Spectral Decomposition</title>
    <author><first>Patricia</first><last>Thaine</last></author>
    <author><first>Gerald</first><last>Penn</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>82&#8211;91</pages>
    <url>http://www.aclweb.org/anthology/W17-4112</url>
    <attachment type="attachment">W17-4112.Attachment.rar</attachment>
    <abstract>We consider two related problems in this paper. Given an undeciphered
	alphabetic writing system or mono-alphabetic cipher, determine: (1) which of
	its letters are vowels and which are consonants; and (2) whether the writing
	system is a vocalic alphabet or an abjad.  We are able to show that a very
	simple spectral decomposition based on character co-occurrences provides nearly
	perfect performance with respect to answering both question types.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>thaine-penn:2017:SCLeM</bibkey>
  </paper>

  <paper id="4113">
    <title>Syllable-level Neural Language Model for Agglutinative Language</title>
    <author><first>Seunghak</first><last>Yu</last></author>
    <author><first>Nilesh</first><last>Kulkarni</last></author>
    <author><first>Haejun</first><last>Lee</last></author>
    <author><first>Jihie</first><last>Kim</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>92&#8211;96</pages>
    <url>http://www.aclweb.org/anthology/W17-4113</url>
    <attachment type="attachment">W17-4113.Attachment.zip</attachment>
    <abstract>We introduce a novel method to diminish the problem of out of vocabulary words
	by introducing an embedding method which leverages the agglutinative property
	of language. We propose additional embedding derived from syllables and
	morphemes for the words to improve the performance of language model. We apply
	the above method to input prediction tasks and achieve state of the art
	performance in terms of Key Stroke Saving (KSS) w.r.t. to existing device input
	prediction methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yu-EtAl:2017:SCLeM</bibkey>
  </paper>

  <paper id="4114">
    <title>Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition</title>
    <author><first>Shotaro</first><last>Misawa</last></author>
    <author><first>Motoki</first><last>Taniguchi</last></author>
    <author><first>Yasuhide</first><last>Miura</last></author>
    <author><first>Tomoko</first><last>Ohkuma</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>97&#8211;102</pages>
    <url>http://www.aclweb.org/anthology/W17-4114</url>
    <abstract>Recently, neural models have shown superior performance over conventional
	models in NER tasks. These models use CNN to extract sub-word information along
	with RNN to predict a tag for each word. However, these models have been tested
	almost entirely on English texts. It remains unclear whether they perform
	similarly in other languages. We worked on Japanese NER using neural models and
	discovered two obstacles of the state-of-the-art model.
	 First, CNN is unsuitable for extracting Japanese sub-word information.
	Secondly, a model predicting a tag for each word cannot extract an entity when
	a part of a word composes an entity. The contributions of this work are (1)
	verifying the effectiveness of the state-of-the-art NER model for Japanese, (2)
	proposing a neural model for predicting a tag for each character using word and
	character information. Experimentally obtained results demonstrate that our
	model outperforms the state-of-the-art neural English NER model in Japanese.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>misawa-EtAl:2017:SCLeM</bibkey>
  </paper>

  <paper id="4115">
    <title>Word Representation Models for Morphologically Rich Languages in Neural Machine Translation</title>
    <author><first>Ekaterina</first><last>Vylomova</last></author>
    <author><first>Trevor</first><last>Cohn</last></author>
    <author><first>Xuanli</first><last>He</last></author>
    <author><first>Gholamreza</first><last>Haffari</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>103&#8211;108</pages>
    <url>http://www.aclweb.org/anthology/W17-4115</url>
    <abstract>Out-of-vocabulary words present a great challenge for Machine Translation. 
	Recently various character-level compositional models 
	were proposed to address this issue. In current research 
	we incorporate two most popular neural architectures, namely LSTM and CNN, into
	hard- and soft-attentional models of translation for character-level
	representation of the source. We propose semantic and morphological intrinsic
	evaluation of encoder-level representations. Our analysis of the learned
	representations reveals that character-based LSTM  seems to be better at
	capturing morphological aspects compared to character-based CNN. We also show
	that hard-attentional model provides better character-level representations
	compared to vanilla one.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vylomova-EtAl:2017:SCLeM</bibkey>
  </paper>

  <paper id="4116">
    <title>Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language</title>
    <author><first>Carlo</first><last>Alva</last></author>
    <author><first>Arturo</first><last>Oncevay</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>109&#8211;116</pages>
    <url>http://www.aclweb.org/anthology/W17-4116</url>
    <abstract>There are several native languages in Peru which are mostly agglutinative.
	These languages are transmitted from generation to generation mainly in oral
	form, causing different forms of writing across different communities. For this
	reason, there are recent efforts to standardize the spelling in the written
	texts, and it would be beneficial to support these tasks with an automatic tool
	such as an spell-checker. In this way, this spelling corrector is being
	developed based on two steps: an automatic rule-based syllabification method
	and a character-level graph to detect the degree of error in a misspelled word.
	The experiments were realized on Shipibo-konibo, a highly agglutinative and
	amazonian language, and the results obtained have been promising in a dataset
	built for the purpose.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>alva-oncevay:2017:SCLeM</bibkey>
  </paper>

  <paper id="4117">
    <title>What do we need to know about an unknown word when parsing German</title>
    <author><first>Bich-Ngoc</first><last>Do</last></author>
    <author><first>Ines</first><last>Rehbein</last></author>
    <author><first>Anette</first><last>Frank</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>117&#8211;123</pages>
    <url>http://www.aclweb.org/anthology/W17-4117</url>
    <abstract>We propose a new type of subword embedding designed to provide more information
	about unknown compounds, a major source for OOV words in German. We present an
	extrinsic evaluation where we use the compound embeddings as input to a neural
	dependency parser and compare the results to the ones obtained with other types
	of embeddings. Our evaluation shows that adding compound embeddings yields a
	significant improvement of 2% LAS over using word embeddings when no POS
	information is available. When adding POS embeddings to the input, however,
	the effect levels out. This suggests that it is not the missing information
	about the semantics of the unknown words that causes problems for parsing
	German, but the lack of morphological information for unknown words. To augment
	our evaluation, we also test the new embeddings in a language modelling task
	that requires both syntactic and semantic information.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>do-rehbein-frank:2017:SCLeM</bibkey>
  </paper>

  <paper id="4118">
    <title>A General-Purpose Tagger with Convolutional Neural Networks</title>
    <author><first>Xiang</first><last>Yu</last></author>
    <author><first>Agnieszka</first><last>Falenska</last></author>
    <author><first>Ngoc Thang</first><last>Vu</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>124&#8211;129</pages>
    <url>http://www.aclweb.org/anthology/W17-4118</url>
    <abstract>We present a general-purpose tagger based on convolutional neural networks
	(CNN), used for both composing word vectors and encoding context information.
	The CNN tagger is robust across different tagging tasks: without task-specific
	tuning of hyper-parameters, it achieves state-of-the-art results in
	part-of-speech tagging, morphological tagging and supertagging. The CNN tagger
	is also robust against the out-of-vocabulary problem; it performs well on
	artificially unnormalized texts.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yu-falenska-vu:2017:SCLeM</bibkey>
  </paper>

  <paper id="4119">
    <title>Reconstruction of Word Embeddings from Sub-Word Parameters</title>
    <author><first>Karl</first><last>Stratos</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>130&#8211;135</pages>
    <url>http://www.aclweb.org/anthology/W17-4119</url>
    <abstract>Pre-trained word embeddings improve the performance of a neural model at the
	cost of increasing the model size. We propose to benefit from this resource
	without paying the cost by operating strictly at the sub-lexical level. Our
	approach is quite simple: before task-specific training, we first optimize
	sub-word parameters to reconstruct pre-trained word embeddings using various
	distance measures. We report interesting results on a variety of tasks: word
	similarity, word analogy, and part-of-speech tagging.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stratos:2017:SCLeM</bibkey>
  </paper>

  <paper id="4120">
    <title>Inflection Generation for Spanish Verbs using Supervised Learning</title>
    <author><first>Cristina</first><last>Barros</last></author>
    <author><first>Dimitra</first><last>Gkatzia</last></author>
    <author><first>Elena</first><last>Lloret</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>136&#8211;141</pages>
    <url>http://www.aclweb.org/anthology/W17-4120</url>
    <abstract>We present a novel supervised approach to inflection generation for verbs in
	Spanish. Our system takes as input the verb's lemma form and the desired
	features such as person, number, tense, and is able to predict the appropriate
	grammatical conjugation. Even though our approach learns from fewer examples
	comparing to previous work, it is able to deal with all the Spanish moods
	(indicative, subjunctive and imperative) in contrast to previous work which
	only focuses on indicative and subjunctive moods. We show that in an intrinsic
	evaluation, our system achieves 99% accuracy, outperforming (although not
	significantly) two competitive state-of-art systems. The successful results
	obtained clearly indicate that our approach could be integrated into wider
	approaches related to text generation in Spanish.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>barros-gkatzia-lloret:2017:SCLeM</bibkey>
  </paper>

  <paper id="4121">
    <title>Neural Paraphrase Identification of Questions with Noisy Pretraining</title>
    <author><first>Gaurav Singh</first><last>Tomar</last></author>
    <author><first>Thyago</first><last>Duque</last></author>
    <author><first>Oscar</first><last>T&#228;ckstr&#246;m</last></author>
    <author><first>Jakob</first><last>Uszkoreit</last></author>
    <author><first>Dipanjan</first><last>Das</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>142&#8211;147</pages>
    <url>http://www.aclweb.org/anthology/W17-4121</url>
    <abstract>We present a solution to the problem of paraphrase identification of questions.
	We focus on a recent dataset of question pairs annotated with binary paraphrase
	labels and show that a variant of the decomposable attention model (replacing
	the word embeddings of the decomposable attention model of Parikh et al. 2016
	with character n-gram representations) results in accurate performance on this
	task, while being far simpler than many competing neural architectures.
	Furthermore, when the model is pretrained on a noisy dataset of automatically
	collected question paraphrases, it obtains the best reported performance on the
	dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tomar-EtAl:2017:SCLeM</bibkey>
  </paper>

  <paper id="4122">
    <title>Sub-character Neural Language Modelling in Japanese</title>
    <author><first>Viet</first><last>Nguyen</last></author>
    <author><first>Julian</first><last>Brooke</last></author>
    <author><first>Timothy</first><last>Baldwin</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>148&#8211;153</pages>
    <url>http://www.aclweb.org/anthology/W17-4122</url>
    <abstract>In East Asian languages such as Japanese and Chinese, the semantics of
	  a character are (somewhat) reflected in its sub-character
	  elements. This paper examines the effect of using sub-characters for
	  language modeling in Japanese. This is achieved by decomposing
	  characters according to a range of character decomposition datasets,
	  and training a neural language model over variously decomposed
	  character representations. Our results indicate that language modelling
	  can be improved through the inclusion of sub-characters, though this
	  result depends on a good choice of decomposition dataset and the
	  appropriate granularity of decomposition.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nguyen-brooke-baldwin:2017:SCLeM</bibkey>
  </paper>

  <paper id="4123">
    <title>Byte-based Neural Machine Translation</title>
    <author><first>Marta R.</first><last>Costa-juss&#224;</last></author>
    <author><first>Carlos</first><last>Escolano</last></author>
    <author><first>Jos&#233; A. R.</first><last>Fonollosa</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>154&#8211;158</pages>
    <url>http://www.aclweb.org/anthology/W17-4123</url>
    <abstract>This paper presents experiments comparing character-based and byte-based neural
	machine translation systems. The main motivation of the byte-based neural
	machine translation system is to build multi-lingual neural machine translation
	systems that can share the same vocabulary. We compare the performance of both
	systems in several language pairs and we see that the performance in test is
	similar for most language pairs while the training time is slightly reduced in
	the case of byte-based neural machine translation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>costajussa-escolano-fonollosa:2017:SCLeM</bibkey>
  </paper>

  <paper id="4124">
    <title>Improving Opinion-Target Extraction with Character-Level Word Embeddings</title>
    <author><first>Soufian</first><last>Jebbara</last></author>
    <author><first>Philipp</first><last>Cimiano</last></author>
    <booktitle>Proceedings of the First Workshop on Subword and Character Level Models in NLP</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>159&#8211;167</pages>
    <url>http://www.aclweb.org/anthology/W17-4124</url>
    <abstract>Fine-grained sentiment analysis is receiving increasing attention in recent
	years.
	Extracting opinion target expressions (OTE) in reviews is often an important
	step in fine-grained, aspect-based sentiment analysis.
	Retrieving this information from user-generated text, however, can be
	difficult.
	Customer reviews, for instance, are prone to contain misspelled words and are
	difficult to process due to their domain-specific language.
	In this work, we investigate whether character-level models can improve the
	performance for the identification of opinion target expressions.
	We integrate information about the character structure of a word into a
	sequence labeling system using character-level word embeddings and show their
	positive impact on the system's performance.
	Specifically, we obtain an increase by 3.3 points F1-score with respect to our
	baseline model.
	In further experiments, we reveal encoded character patterns of the learned
	embeddings and give a nuanced view of the performance differences of both
	models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jebbara-cimiano:2017:SCLeM</bibkey>
  </paper>

</volume>

