<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="2200">
    <title>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</title>
    <editor>Beatrice Alex</editor>
    <editor>Stefania Degaetano-Ortlieb</editor>
    <editor>Anna Feldman</editor>
    <editor>Anna Kazantseva</editor>
    <editor>Nils Reiter</editor>
    <editor>Stan Szpakowicz</editor>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-22</url>
    <bibtype>book</bibtype>
    <bibkey>LaTeCH-CLfL:2017</bibkey>
  </paper>

  <paper id="2201">
    <title>Metaphor Detection in a Poetry Corpus</title>
    <author><first>Vaibhav</first><last>Kesarwani</last></author>
    <author><first>Diana</first><last>Inkpen</last></author>
    <author><first>Stan</first><last>Szpakowicz</last></author>
    <author><first>Chris</first><last>Tanasescu</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;9</pages>
    <url>http://www.aclweb.org/anthology/W17-2201</url>
    <abstract>Metaphor is indispensable in poetry. It showcases the poet's creativity, and
	contributes to the overall emotional pertinence of the poem while honing its
	specific rhetorical impact. Previous work on metaphor detection relies on
	either rule-based or statistical models, none of them applied to poetry. Our
	method focuses on metaphor detection in a poetry corpus. It combines rule-based
	and statistical models (word embeddings) to develop a new classification
	system. Our system has achieved a precision of 0.759 and a recall of 0.804 in
	identifying one type of metaphor in poetry.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kesarwani-EtAl:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2202">
    <title>Machine Translation and Automated Analysis of the Sumerian Language</title>
    <author><first>&#201;milie</first><last>Pag&#233;-Perron</last></author>
    <author><first>Maria</first><last>Sukhareva</last></author>
    <author><first>Ilya</first><last>Khait</last></author>
    <author><first>Christian</first><last>Chiarcos</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>10&#8211;16</pages>
    <url>http://www.aclweb.org/anthology/W17-2202</url>
    <abstract>This paper presents a newly funded international project for machine
	translation and automated analysis of ancient cuneiform languages where NLP
	specialists and Assyriologists collaborate to create an information retrieval
	system for Sumerian. This research is conceived in response to the need to
	translate large numbers of administrative texts that are only available in
	transcription, in order to make them accessible to a wider audience. The
	methodology includes creation of a specialized NLP pipeline and also the use of
	linguistic linked open data to increase access to the results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pageperron-EtAl:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2203">
    <title>Investigating the Relationship between Literary Genres and Emotional Plot Development</title>
    <author><first>Evgeny</first><last>Kim</last></author>
    <author><first>Sebastian</first><last>Pad&#243;</last></author>
    <author><first>Roman</first><last>Klinger</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>17&#8211;26</pages>
    <url>http://www.aclweb.org/anthology/W17-2203</url>
    <abstract>Literary genres are commonly viewed as being defined in terms of content and
	stylistic features. In this paper, we focus on one particular class of lexical
	features, namely emotion information, and investigate the hypothesis that
	emotion-related information correlates with particular genres. Us- ing genre
	classification as a testbed, we compare a model that computes lexicon- based
	emotion scores globally for complete stories with a model that tracks emotion
	arcs through stories on a subset of Project Gutenberg with five genres.
	Our main findings are: (a), the global emotion model is competitive with a
	large-vocabulary bag-of-words genre classifier (80%F1); (b), the emotion arc
	model shows a lower performance (59 % F1) but shows complementary behavior to
	the global model, as indicated by a very good performance of an oracle model
	(94 % F1) and an improved performance of an ensemble model (84 % F1); (c),
	genres differ in the extent to which stories follow the same emotional arcs,
	with particularly uniform behavior for anger (mystery) and fear (ad- ventures,
	romance, humor, science fiction).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kim-pado-klinger:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2204">
    <title>Enjambment Detection in a Large Diachronic Corpus of Spanish Sonnets</title>
    <author><first>Pablo</first><last>Ruiz</last></author>
    <author><first>Clara</first><last>Mart&#237;nez Cant&#243;n</last></author>
    <author><first>Thierry</first><last>Poibeau</last></author>
    <author><first>Elena</first><last>Gonz&#225;lez-Blanco</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>27&#8211;32</pages>
    <url>http://www.aclweb.org/anthology/W17-2204</url>
    <abstract>Enjambment takes place when a syntactic unit is broken up across two lines of
	poetry, giving rise to different stylistic effects. In Spanish literary
	studies, there are unclear points about the types of stylistic effects that can
	arise, and under which linguistic conditions. To systematically gather evidence
	about this, we developed a system to automatically identify enjambment (and its
	type) in Spanish. For evaluation, we manually annotated a reference corpus
	covering different periods. As a scholarly corpus to apply the tool, from
	public HTML sources we created a diachronic corpus covering four centuries of
	sonnets (3750 poems), and we analyzed the occurrence of enjambment across
	stanzaic boundaries in different periods. Besides, we found examples that
	highlight limitations in current definitions of enjambment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ruiz-EtAl:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2205">
    <title>Plotting Markson's "Mistress"</title>
    <author><first>Conor</first><last>Kelleher</last></author>
    <author><first>Mark</first><last>Keane</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>33&#8211;39</pages>
    <url>http://www.aclweb.org/anthology/W17-2205</url>
    <abstract>The post-modern novel "Wittgenstein’s Mistress" by David Markson (1988)
	presents the reader with a very challenging non-linear narrative, that itself
	appears to one of the novel’s themes. We present a distant reading of this
	work designed to complement a close reading of it by David Foster Wallace
	(1990).   Using a combination of text analysis, entity recognition and
	networks, we plot repetitive structures in the novel’s narrative relating
	them to its critical analysis.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kelleher-keane:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2206">
    <title>Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German</title>
    <author><first>Nina</first><last>Seemann</last></author>
    <author><first>Marie-Luis</first><last>Merten</last></author>
    <author><first>Michaela</first><last>Geierhos</last></author>
    <author><first>Doris</first><last>Tophinke</last></author>
    <author><first>Eyke</first><last>H&#252;llermeier</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>40&#8211;45</pages>
    <url>http://www.aclweb.org/anthology/W17-2206</url>
    <abstract>In this paper, we present the annotation challenges we have encountered when
	working on a historical language that was undergoing elaboration processes. We
	especially focus on syntactic ambiguity and gradience in Middle Low German,
	which causes uncertainty to some extent. Since current annotation tools
	consider construction contexts and the dynamics of the grammaticalization only
	partially, we plan to extend CorA - a web-based annotation tool for historical
	and other non-standard language data - to capture elaboration phenomena and
	annotator unsureness. Moreover, we seek to interactively learn morphological as
	well as syntactic annotations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>seemann-EtAl:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2207">
    <title>Phonological Soundscapes in Medieval Poetry</title>
    <author><first>Christopher</first><last>Hench</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>46&#8211;56</pages>
    <url>http://www.aclweb.org/anthology/W17-2207</url>
    <abstract>The oral component of medieval poetry was integral to its performance and
	reception. Yet many believe that the medieval voice has been forever lost, and
	any attempts at rediscovering it are doomed to failure due to scribal
	practices, manuscript mouvance, and linguistic normalization in editing
	practices. This paper offers a method to abstract from this noise and better
	understand relative differences in phonological soundscapes by considering
	syllable qualities. The presented syllabification method and soundscape
	analysis offer themselves as cross-disciplinary tools for low-resource
	languages. As a case study, we examine medieval German lyric and argue that the
	heavily debated lyrical ‘I’ follows a unique trajectory through
	soundscapes, shedding light on the performance and practice of these poets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hench:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2208">
    <title>An End-to-end Environment for Research Question-Driven Entity Extraction and Network Analysis</title>
    <author><first>Andre</first><last>Blessing</last></author>
    <author><first>Nora</first><last>Echelmeyer</last></author>
    <author><first>Markus</first><last>John</last></author>
    <author><first>Nils</first><last>Reiter</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>57&#8211;67</pages>
    <url>http://www.aclweb.org/anthology/W17-2208</url>
    <abstract>This paper presents an approach to extract co-occurrence networks from literary
	texts. It is a deliberate decision not to aim for a fully automatic pipeline,
	as the literary research questions need to guide both the definition of the
	nature of the things that co-occur as well as how to decide co-occurrence. We
	showcase the approach on a Middle High German romance, parz. Manual inspection
	and discussion shows the huge impact various choices  have.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>blessing-EtAl:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2209">
    <title>Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns</title>
    <author><first>Stefania</first><last>Degaetano-Ortlieb</last></author>
    <author><first>Elke</first><last>Teich</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>68&#8211;77</pages>
    <url>http://www.aclweb.org/anthology/W17-2209</url>
    <abstract>We present a data-driven approach to investigate
	intra-textual variation by combining
	entropy and surprisal. With this
	approach we detect linguistic variation
	based on phrasal lexico-grammatical patterns
	across sections of research articles.
	Entropy is used to detect patterns typical
	of specific sections. Surprisal is used
	to differentiate between more and less
	informationally-loaded patterns as well as
	type of information (topical vs. stylistic).
	While we here focus on research articles in
	biology/genetics, the methodology is especially
	interesting for digital humanities
	scholars, as it can be applied to any text
	type or domain and combined with additional
	variables (e.g. time, author or social
	group).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>degaetanoortlieb-teich:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2210">
    <title>Finding a Character’s Voice: Stylome Classification on Literary Characters</title>
    <author><first>Liviu P.</first><last>Dinu</last></author>
    <author><first>Ana Sabina</first><last>Uban</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>78&#8211;82</pages>
    <url>http://www.aclweb.org/anthology/W17-2210</url>
    <abstract>We investigate in this paper the problem of classifying the stylome of
	characters in a literary work. Previous research in the field of authorship
	attribution has shown that the writing style of an author can be characterized
	and distinguished from that of other authors automatically. In this paper we
	take a look at the less approached problem of how the styles of different
	characters can be distinguished, trying to verify if an author managed to
	create believable characters with individual styles. We present the results of
	some initial experiments developed on the novel "Liaisons Dangereuses", showing
	that a
	simple bag of words model can be used to classify the characters.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dinu-uban:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2211">
    <title>An Ontology-Based Method for Extracting and Classifying Domain-Specific Compositional Nominal Compounds</title>
    <author><first>Maria Pia</first><last>di Buono</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>83&#8211;88</pages>
    <url>http://www.aclweb.org/anthology/W17-2211</url>
    <abstract>In this paper, we present our preliminary
	study on an ontology-based method to extract
	and classify compositional nominal
	compounds in specific domains of knowledge.
	This method is based on the assumption
	that, applying a conceptual model to
	represent knowledge domain, it is possible
	to improve the extraction and classification
	of lexicon occurrences for that domain
	in a semi-automatic way. We explore
	the possibility of extracting and classifying
	a specific construction type (nominal
	compounds) spanning a specific domain
	(Cultural Heritage) and a specific
	language (Italian).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dibuono:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2212">
    <title>Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin</title>
    <author><first>G&#233;raldine</first><last>Walther</last></author>
    <author><first>Beno&#238;t</first><last>Sagot</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>89&#8211;94</pages>
    <url>http://www.aclweb.org/anthology/W17-2212</url>
    <abstract>In this paper, we present ongoing work for developing language resources and
	basic NLP tools for an undocumented variety of Romansh, in the context of a
	language documentation and language acquisition project. Our tools are meant to
	improve the speed and reliability of corpus annotations for noisy data
	involving large amounts of code-switching, occurrences of child-speech and
	orthographic noise. Being able to increase the efficiency of language resource
	development for language documentation and acquisition research also
	constitutes a step towards solving the data sparsity issues with which
	researchers have been struggling.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>walther-sagot:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2213">
    <title>Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite</title>
    <author><first>Maria</first><last>Sukhareva</last></author>
    <author><first>Francesco</first><last>Fuscagni</last></author>
    <author><first>Johannes</first><last>Daxenberger</last></author>
    <author><first>Susanne</first><last>G&#246;rke</last></author>
    <author><first>Doris</first><last>Prechel</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>95&#8211;104</pages>
    <url>http://www.aclweb.org/anthology/W17-2213</url>
    <abstract>This paper presents a statistical approach to automatic morphosyntactic
	annotation of Hittite transcripts. Hittite is an extinct Indo-European language
	using the cuneiform script. There are currently no morphosyntactic annotations
	available for Hittite, so we explored methods of distant supervision. The
	annotations were projected from parallel German translations of the Hittite
	texts. In order to reduce data sparsity, we applied stemming of German and
	Hittite texts. As there is no off-the-shelf Hittite stemmer, a stemmer for
	Hittite was developed for this purpose. The resulting annotation projections
	were used to train a POS tagger, achieving an accuracy of 69% on a test sample.
	To our knowledge, this is the first attempt of statistical POS tagging of a
	cuneiform language.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sukhareva-EtAl:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2214">
    <title>A Dataset for Sanskrit Word Segmentation</title>
    <author><first>Amrith</first><last>Krishna</last></author>
    <author><first>Pavan Kumar</first><last>Satuluri</last></author>
    <author><first>Pawan</first><last>Goyal</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>105&#8211;114</pages>
    <url>http://www.aclweb.org/anthology/W17-2214</url>
    <abstract>The last decade saw a surge in digitisation efforts for ancient manuscripts in
	Sanskrit. Due to various linguistic peculiarities inherent to the language,
	even the preliminary tasks such as word segmentation are non-trivial in
	Sanskrit. Elegant models for Word Segmentation in Sanskrit are indispensable
	for further syntactic and semantic processing of the manuscripts. Current works
	in word segmentation for Sanskrit, though commendable in their novelty, often
	have variations in their objective and evaluation criteria. In this work, we
	set the record straight. We formally define the objectives and the requirements
	for the word segmentation task. In order to encourage research in the field and
	to alleviate the time and effort required in pre-processing, we release a
	dataset of 115,000 sentences for word segmentation. For each sentence in the
	dataset we include the input character sequence, ground truth segmentation, and
	additionally lexical and morphological information about all the phonetically
	possible segments for the given sentence. In this work, we also discuss the
	linguistic considerations made while generating the candidate space of the
	possible segments.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>krishna-satuluri-goyal:2017:LaTeCH-CLfL</bibkey>
  </paper>

  <paper id="2215">
    <title>Lexical Correction of Polish Twitter Political Data</title>
    <author><first>Maciej</first><last>Ogrodniczuk</last></author>
    <author><first>Mateusz</first><last>Kope&#x107;</last></author>
    <booktitle>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>115&#8211;125</pages>
    <url>http://www.aclweb.org/anthology/W17-2215</url>
    <abstract>Language processing architectures are often evaluated in near-to-perfect
	conditions with respect to processed content. The tools which perform
	sufficiently well on electronic press, books and other type of non-interactive
	content may poorly handle littered, colloquial and multilingual textual data
	which make the majority of communication today. This paper aims at
	investigating how Polish Twitter data (in a slightly controlled `political'
	flavour) differs from expectation of linguistic tools and how they could be
	corrected to be ready for processing by standard language processing chains
	available for Polish. The setting includes specialised components for spelling
	correction of tweets as well as hashtag and username decoding.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ogrodniczuk-kopec:2017:LaTeCH-CLfL</bibkey>
  </paper>

</volume>

