<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W16">
  <paper id="4000">
    <title>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</title>
    <editor>Erhard Hinrichs</editor>
    <editor>Marie Hinrichs</editor>
    <editor>Thorsten Trippel</editor>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <url>http://aclweb.org/anthology/W16-40</url>
    <bibtype>book</bibtype>
    <bibkey>LT4DH:2016</bibkey>
  </paper>

  <paper id="4001">
    <title>Flexible and Reliable Text Analytics in the Digital Humanities &#8211; Some Methodological Considerations</title>
    <author><first>Jonas</first><last>Kuhn</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>1</pages>
    <url>http://aclweb.org/anthology/W16-4001</url>
    <abstract>The availability of Language Technology Resources and Tools generates a
	considerable methodological potential in the Digital Humanities: aspects of
	research questions from the Humanities and Social Sciences can be addressed on
	text collections in ways that were unavailable to traditional approaches.  I
	start this talk by sketching some sample scenarios of Digital Humanities
	projects which involve various Humanities and Social Science disciplines,
	noting that the potential for a meaningful contribution to higher-level
	questions is highest when the employed language technological models are
	carefully tailored both (a) to characteristics of the given target corpus, and
	(b) to relevant analytical subtasks feeding the discipline-specific research
	questions.
	Keeping up a multidisciplinary perspective, I then point out a recurrent
	dilemma in Digital Humanities projects that follow the conventional set-up of
	collaboration: to build high-quality computational models for the data, fixed
	analytical targets should be specified as early as possible -- but to be able
	to respond to Humanities questions as they evolve over the course of analysis,
	the analytical machinery should be kept maximally flexible.  To reach both, I
	argue for a novel collaborative culture that rests on a more interleaved,
	continuous dialogue.  (Re-)Specification of analytical targets should be an
	ongoing process in which the Humanities Scholars and Social Scientists play a
	role that is as important as the Computational Scientists' role.  A promising
	approach lies in the identification of re-occurring types of analytical
	subtasks, beyond linguistic standard tasks, which can form building blocks for
	text analysis across disciplines, and for which corpus-based characterizations
	(viz. annotations) can be collected, compared and revised.  On such grounds,
	computational modeling is more directly tied to the evolving research
	questions, and hence the seemingly opposing needs of reliable target
	specifications vs. "malleable" frameworks of analysis can be reconciled. 
	Experimental work following this approach is under way in the Center for
	Reflected Text Analytics (CRETA) in Stuttgart.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kuhn:2016:LT4DH</bibkey>
  </paper>

  <paper id="4002">
    <title>Finding Rising and Falling Words</title>
    <author><first>Erik</first><last>Tjong Kim Sang</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>2&#8211;9</pages>
    <url>http://aclweb.org/anthology/W16-4002</url>
    <abstract>We examine two different methods for finding rising words (among which
	neologisms) and falling words (among which archaisms) in decades of magazine
	texts (millions of words) and in years of tweets (billions of words): one based
	on correlation coefficients of relative frequencies and time, and one based on
	comparing initial and final word frequencies of time intervals. We find that
	smoothing frequency scores improves the precision scores of both methods and
	that the correlation coefficients perform better on magazine text but worse on
	tweets. Since the two ranking methods find different words they can be used in
	side-by-side to study the behavior of words over time.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tjongkimsang:2016:LT4DH</bibkey>
  </paper>

  <paper id="4003">
    <title>A Dataset for Multimodal Question Answering in the Cultural Heritage Domain</title>
    <author><first>Shurong</first><last>Sheng</last></author>
    <author><first>Luc</first><last>Van Gool</last></author>
    <author><first>Marie-Francine</first><last>Moens</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>10&#8211;17</pages>
    <url>http://aclweb.org/anthology/W16-4003</url>
    <abstract>Multimodal question answering in the cultural heritage domain allows visitors
	to ask questions in a more natural way and thus provides better user
	experiences with cultural objects while visiting a museum, landmark or any
	other historical site. In this paper, we introduce the construction of a golden
	standard dataset that will aid research of multimodal question answering in the
	cultural heritage domain. The dataset, which will be soon released to the
	public, contains multimodal content including images of typical artworks from
	the fascinating old-Egyptian Amarna period, related image-containing documents
	of the artworks and over 800 multimodal queries integrating visual and textual
	questions. The multimodal questions and related documents are all in English.
	The multimodal questions are linked to relevant paragraphs in the related
	documents that contain the answer to the multimodal query.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sheng-vangool-moens:2016:LT4DH</bibkey>
  </paper>

  <paper id="4004">
    <title>Extracting Social Networks from Literary Text with Word Embedding Tools</title>
    <author><first>Gerhard</first><last>Wohlgenannt</last></author>
    <author><first>Ekaterina</first><last>Chernyak</last></author>
    <author><first>Dmitry</first><last>Ilvovsky</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>18&#8211;25</pages>
    <url>http://aclweb.org/anthology/W16-4004</url>
    <abstract>In this paper a social network is extracted from a literary text. The social
	network shows, how frequent the characters interact and how similar their
	social behavior is. Two types of similarity measures are used: the first
	applies co-occurrence statistics, while the second exploits cosine similarity
	on different types of word embedding vectors.
	The results are evaluated by a paid micro-task crowdsourcing survey. The
	experiments suggest that specific types of word embeddings like word2vec are
	well-suited for the task at hand and the specific circumstances of literary
	fiction text.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wohlgenannt-chernyak-ilvovsky:2016:LT4DH</bibkey>
  </paper>

  <paper id="4005">
    <title>Exploration of register-dependent lexical semantics using word embeddings</title>
    <author><first>Andrey</first><last>Kutuzov</last></author>
    <author><first>Elizaveta</first><last>Kuzmenko</last></author>
    <author><first>Anna</first><last>Marakasova</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>26&#8211;34</pages>
    <url>http://aclweb.org/anthology/W16-4005</url>
    <abstract>We present an approach to detect differences in lexical semantics across
	English language registers, using word embedding models from distributional
	semantics paradigm. Models trained on register-specific subcorpora of the BNC
	corpus are employed to compare lists of nearest associates for particular words
	and draw conclusions about their semantic shifts depending on register in which
	they are used. The models are evaluated on the task of register classification
	with the help of the deep inverse regression approach.
	Additionally, we present a demo web service featuring most of the described
	models and allowing to explore word meanings in different English registers and
	to detect register affiliation for arbitrary texts. The code for the service
	can be easily adapted to any set of underlying models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kutuzov-kuzmenko-marakasova:2016:LT4DH</bibkey>
  </paper>

  <paper id="4006">
    <title>Original-Transcribed Text Alignment for Manyosyu Written by Old Japanese Language</title>
    <author><first>Teruaki</first><last>Oka</last></author>
    <author><first>Tomoaki</first><last>Kono</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>35&#8211;44</pages>
    <url>http://aclweb.org/anthology/W16-4006</url>
    <abstract>We are constructing an annotated diachronic corpora of the Japanese language.
	In part of thiswork, we construct a corpus of Manyosyu, which is an old
	Japanese poetry anthology. In thispaper, we describe how to align the
	transcribed text and its original text semiautomatically to beable to
	cross-reference them in our Manyosyu corpus. Although we align the original
	charactersto the transcribed words manually, we preliminarily align the
	transcribed and original charactersby using an unsupervised automatic alignment
	technique of statistical machine translation toalleviate the work. We found
	that automatic alignment achieves an F1-measure of 0.83; thus, eachpoem has
	1--2 alignment errors. However, finding these errors and modifying them are
	less workintensiveand more efficient than fully manual annotation. The
	alignment probabilities can beutilized in this modification. Moreover, we found
	that we can locate the uncertain transcriptionsin our corpus and compare them
	to other transcriptions, by using the alignment probabilities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>oka-kono:2016:LT4DH</bibkey>
  </paper>

  <paper id="4007">
    <title>Shamela: A Large-Scale Historical Arabic Corpus</title>
    <author><first>Yonatan</first><last>Belinkov</last></author>
    <author><first>Alexander</first><last>Magidow</last></author>
    <author><first>Maxim</first><last>Romanov</last></author>
    <author><first>Avi</first><last>Shmidman</last></author>
    <author><first>Moshe</first><last>Koppel</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>45&#8211;53</pages>
    <url>http://aclweb.org/anthology/W16-4007</url>
    <abstract>Arabic is a widely-spoken language with a rich and long history spanning more
	than fourteen centuries. Yet existing Arabic corpora largely focus on the
	modern period or lack sufficient diachronic information. We develop a
	large-scale, historical corpus of Arabic of about 1 billion words from diverse
	periods of time. We clean this corpus, process it with a morphological
	analyzer, and enhance it by detecting parallel passages and automatically
	dating undated texts. We demonstrate its utility with selected case-studies in
	which we show its application to the digital humanities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>belinkov-EtAl:2016:LT4DH</bibkey>
  </paper>

  <paper id="4008">
    <title>Feelings from the Past—Adapting Affective Lexicons for Historical Emotion Analysis</title>
    <author><first>Sven</first><last>Buechel</last></author>
    <author><first>Johannes</first><last>Hellrich</last></author>
    <author><first>Udo</first><last>Hahn</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>54&#8211;61</pages>
    <url>http://aclweb.org/anthology/W16-4008</url>
    <abstract>We here describe a novel methodology for measuring affective language in
	historical text by expanding an affective lexicon and jointly adapting it to
	prior language stages. We automatically construct a lexicon for word-emotion
	association of 18th and 19th century German which is then validated against
	expert ratings. Subsequently, this resource is used to identify distinct
	emotional patterns and trace long-term emotional trends in different genres of
	writing spanning several centuries.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>buechel-hellrich-hahn:2016:LT4DH</bibkey>
  </paper>

  <paper id="4009">
    <title>Automatic parsing as an efficient pre-annotation tool for historical texts</title>
    <author><first>Hanne Martine</first><last>Eckhoff</last></author>
    <author><first>Aleksandrs</first><last>Berdicevskis</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>62&#8211;70</pages>
    <url>http://aclweb.org/anthology/W16-4009</url>
    <abstract>Historical treebanks tend to be manually annotated, which is not surprising,
	since state-of-the-art parsers are not accurate enough to ensure high-quality
	annotation for historical texts. We test whether automatic parsing can be an
	efficient pre-annotation tool for Old East Slavic texts. We use the TOROT
	treebank from the PROIEL treebank family. We convert the PROIEL format to the
	CONLL format and use MaltParser to create syntactic pre-annotation. Using the
	most conservative evaluation method, which takes into account PROIEL-specific
	features, MaltParser by itself yields 0.845 unlabelled attachment score, 0.779
	labelled attachment score and 0.741 secondary dependency accuracy (note,
	though, that the test set comes from a relatively simple genre and contains
	rather short sentences). Experiments with human annotators show that
	preparsing, if limited to sentences where no changes to word or sentence
	boundaries are required, increases their annotation rate. For experienced
	annotators, the speed gain varies from 5.80% to 16.57%, for inexperienced
	annotators from 14.61% to 32.17% (using conservative estimates). There are no
	strong reliable differences in the annotation accuracy, which means that there
	is no reason to suspect that using preparsing might lower the final annotation
	quality.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>eckhoff-berdicevskis:2016:LT4DH</bibkey>
  </paper>

  <paper id="4010">
    <title>A Visual Representation of Wittgenstein's Tractatus Logico-Philosophicus</title>
    <author><first>Anca</first><last>Bucur</last></author>
    <author><first>Sergiu</first><last>Nisioi</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>71&#8211;75</pages>
    <url>http://aclweb.org/anthology/W16-4010</url>
    <abstract>In this paper we will discuss a method for data visualization together with its
	potential usefulness in 
	digital humanities and philosophy of language. We compiled a multilingual
	parallel corpus from different 
	versions of \textit{Wittgenstein’s Tractatus Logico-philosophicus}, including
	the original in German and 
	translations into English, Spanish, French, and Russian. Using this corpus, we
	compute a similarity measure 
	between propositions and render a visual network of relations for different
	languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bucur-nisioi:2016:LT4DH</bibkey>
  </paper>

  <paper id="4011">
    <title>A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures</title>
    <author><first>Richard</first><last>Eckart de Castilho</last></author>
    <author><first>&#201;va</first><last>M&#250;jdricza-Maydt</last></author>
    <author><first>Seid Muhie</first><last>Yimam</last></author>
    <author><first>Silvana</first><last>Hartmann</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <author><first>Anette</first><last>Frank</last></author>
    <author><first>Chris</first><last>Biemann</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>76&#8211;84</pages>
    <url>http://aclweb.org/anthology/W16-4011</url>
    <abstract>We introduce the third major release of WebAnno, a generic web-based annotation
	tool for distributed teams. New features in this release focus on semantic
	annotation tasks (e.g. semantic role labelling or event annotation) and allow
	the tight integration of semantic annotations with syntactic annotations. In
	particular, we introduce the concept of slot features, a novel constraint
	mechanism that allows modelling the interaction between semantic and syntactic
	annotations, as well as a new annotation user interface. The new features were
	developed and used in an annotation project for semantic roles on German texts.
	The paper briefly introduces this project and reports on experiences performing
	annotations with the new tool. On a comparative evaluation, our tool reaches
	significant speedups over WebAnno 2 for a semantic annotation task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>eckartdecastilho-EtAl:2016:LT4DH</bibkey>
  </paper>

  <paper id="4012">
    <title>Challenges and Solutions for Latin Named Entity Recognition</title>
    <author><first>Alexander</first><last>Erdmann</last></author>
    <author><first>Christopher</first><last>Brown</last></author>
    <author><first>Brian</first><last>Joseph</last></author>
    <author><first>Mark</first><last>Janse</last></author>
    <author><first>Petra</first><last>Ajaka</last></author>
    <author><first>Micha</first><last>Elsner</last></author>
    <author><first>Marie-Catherine</first><last>de Marneffe</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>85&#8211;93</pages>
    <url>http://aclweb.org/anthology/W16-4012</url>
    <abstract>Although spanning thousands of years and genres as diverse as liturgy,
	historiography, lyric and other forms of prose and poetry, the body of Latin
	texts is still relatively sparse compared to English. Data sparsity in Latin
	presents a number of challenges for traditional Named Entity Recognition
	techniques. Solving such challenges and enabling reliable Named Entity
	Recognition in Latin texts can facilitate many down-stream applications, from
	machine translation to digital historiography, enabling Classicists,
	historians, and archaeologists for instance, to track the relationships of
	historical persons, places, and groups on a large scale. This paper presents
	the first annotated corpus for evaluating Named Entity Recognition in Latin, as
	well as a fully supervised model that achieves over 90% F-score on a held-out
	test set, significantly outperforming a competitive baseline. We also present a
	novel active learning strategy that predicts how many and which sentences need
	to be annotated for named entities in order to attain a specified degree of
	accuracy when recognizing named entities automatically in a given text. This
	maximizes the productivity of annotators while simultaneously controlling
	quality.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>erdmann-EtAl:2016:LT4DH</bibkey>
  </paper>

  <paper id="4013">
    <title>Geographical Visualization of Search Results in Historical Corpora</title>
    <author><first>Florian</first><last>Petran</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>94&#8211;100</pages>
    <url>http://aclweb.org/anthology/W16-4013</url>
    <abstract>We present ANNISVis, a webapp for comparative visualization of geographical
	distribution of linguistic data, as well as a sample deployment for a corpus of
	Middle High German texts. Unlike existing geographical visualization solutions,
	which work with pre-existing data sets, or are bound to specific corpora,
	ANNISVis allows the user to formulate multiple ad-hoc queries and visualizes
	them on a map, and it can be configured for any corpus that can be imported
	into ANNIS. This enables explorative queries of the quantitative aspects of a
	corpus with geographical features. The tool will be made available to download
	in open source.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>petran:2016:LT4DH</bibkey>
  </paper>

  <paper id="4014">
    <title>Implementation of a Workflow Management System for Non-Expert Users</title>
    <author><first>Bart</first><last>Jongejan</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>101&#8211;108</pages>
    <url>http://aclweb.org/anthology/W16-4014</url>
    <abstract>In the Danish CLARIN-DK infrastructure, chaining language technology (LT) tools
	into a workflow is easy even for a non-expert user, because she only needs to
	specify the input and the desired output of the workflow. With this information
	and the registered input and output profiles of the available tools, the
	CLARIN-DK workflow management system (WMS) computes combinations of tools that
	will give the desired result. This advanced functionality was originally not
	envisaged, but came within reach by writing the WMS partly in Java and partly
	in a programming language for symbolic computation, Bracmat. Handling LT tool
	profiles, including the computation of workflows, is easier with Bracmat's
	language constructs for tree pattern matching and tree construction than with
	the language constructs offered by mainstream programming languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jongejan:2016:LT4DH</bibkey>
  </paper>

  <paper id="4015">
    <title>Integrating Optical Character Recognition and Machine Translation of Historical Documents</title>
    <author><first>Haithem</first><last>Afli</last></author>
    <author><first>Andy</first><last>Way</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>109&#8211;116</pages>
    <url>http://aclweb.org/anthology/W16-4015</url>
    <abstract>Machine Translation (MT) plays a critical role in expanding capacity in the
	translation industry.
	However, many valuable documents, including digital documents, are encoded in
	non-accessible formats for machine processing (e.g., Historical or Legal
	documents). 
	Such documents must be passed through a process of Optical Character
	Recognition (OCR) to render the text suitable for MT. 
	No matter how good the OCR is, this process introduces recognition
	errors, which often renders MT ineffective. In this paper, we propose a new OCR
	to MT framework based on adding a new OCR error correction module to enhance
	the overall quality of translation.
	Experimentation shows that our new system correction based on the combination
	of Language Modeling and Translation methods outperforms the baseline system by
	nearly 30% relative improvement.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>afli-way:2016:LT4DH</bibkey>
  </paper>

  <paper id="4016">
    <title>Language technology tools and resources for the analysis of multimodal communication</title>
    <author><first>L&#225;szl&#243;</first><last>Hunyadi</last></author>
    <author><first>Tam&#225;s</first><last>V&#225;radi</last></author>
    <author><first>Istv&#225;n</first><last>Szekr&#233;nyes</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>117&#8211;124</pages>
    <url>http://aclweb.org/anthology/W16-4016</url>
    <abstract>In this paper we describe how the complexity of human communication can be
	analysed with the help of language technology. We present the HuComTech corpus,
	a multimodal corpus containing 50 hours of videotaped interviews containing a
	rich annotation of about 2 million items annotated on 33 levels. The corpus
	serves as a general resource for a wide range of re-search addressing natural
	conversation between humans in their full complexity. It can benefit
	particularly digital humanities researchers working in the field of pragmatics,
	conversational analysis and discourse analysis. We will present a number of
	tools and automated methods that can help such enquiries. In particular, we
	will highlight the tool Theme, which is designed to uncover hidden temporal
	patterns (called T-patterns) in human interaction, and will show how it can
	applied to the study of multimodal communication.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hunyadi-varadi-szekrenyes:2016:LT4DH</bibkey>
  </paper>

  <paper id="4017">
    <title>Large-scale Analysis of Spoken Free-verse Poetry</title>
    <author><first>Timo</first><last>Baumann</last></author>
    <author><first>Burkhard</first><last>Meyer-Sickendiek</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>125&#8211;130</pages>
    <url>http://aclweb.org/anthology/W16-4017</url>
    <abstract>Most modern and post-modern poems have developed a post-metrical idea of
	lyrical prosody that employs rhythmical features of everyday language and prose
	instead of a strict adherence to rhyme and metrical schemes. This development
	is subsumed under the term free verse prosody. We present our methodology for
	the large-scale analysis of modern and post-modern poetry in both their written
	form and as spoken aloud by the author. We employ language processing tools to
	align text and speech, to generate a null-model of how the poem would be spoken
	by a na\"{i}ve reader, and to extract contrastive prosodic features used by the
	poet. On these, we intend to build our model of free verse prosody, which will
	help to understand, differentiate and relate the different styles of free verse
	poetry. We plan to use our processing scheme on large amounts of data to
	iteratively build models of styles, to validate and guide manual style
	annotation, to identify further rhythmical categories, and ultimately to
	broaden our understanding of free verse poetry. In this paper, we report on a
	proof-of-concept of our methodology using smaller amounts of poems and a
	limited set of features. We find that our methodology helps to extract
	differentiating features in the authors' speech that can be explained by
	philological insight. Thus, our automatic method helps to guide the literary
	analysis and this in turn helps to improve our computational models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>baumann-meyersickendiek:2016:LT4DH</bibkey>
  </paper>

  <paper id="4018">
    <title>PAT workbench: Annotation and Evaluation of Text and Pictures in Multimodal Instructions</title>
    <author><first>Ielka</first><last>van der Sluis</last></author>
    <author><first>Lennart</first><last>Kloppenburg</last></author>
    <author><first>Gisela</first><last>Redeker</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>131&#8211;139</pages>
    <url>http://aclweb.org/anthology/W16-4018</url>
    <abstract>This paper presents a tool to investigate the design of multimodal instructions
	(MIs), i.e., instructions that contain both text and pictures. The benefit of
	including pictures in information presentation has been established, but the
	characteristics of those pictures and of their textual counterparts and the
	rela-tion(s) between them have not been researched in a systematic manner. We
	present the PAT Work-bench, a tool to store, annotate and retrieve MIs based on
	a validated coding scheme with currently 42 categories that describe
	instructions in terms of textual features, pictorial elements, and relations
	be-tween text and pictures. We describe how the PAT Workbench facilitates
	collaborative annotation and inter-annotator agreement calculation. Future work
	on the tool includes expanding its functionality and usability by (i) making
	the MI annotation scheme dynamic for adding relevant features based on
	empirical evaluations of the MIs, (ii) implementing algorithms for automatic
	tagging of MI features, and (iii) implementing automatic MI evaluation
	algorithms based on results obtained via e.g. crowdsourced assessments of MIs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vandersluis-kloppenburg-redeker:2016:LT4DH</bibkey>
  </paper>

  <paper id="4019">
    <title>Semantic Indexing of Multilingual Corpora and its Application on the History Domain</title>
    <author><first>Alessandro</first><last>Raganato</last></author>
    <author><first>Jose</first><last>Camacho-Collados</last></author>
    <author><first>Antonio</first><last>Raganato</last></author>
    <author><first>Yunseo</first><last>Joung</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>140&#8211;147</pages>
    <url>http://aclweb.org/anthology/W16-4019</url>
    <abstract>The increasing amount of multilingual text collections available in different
	domains makes its automatic processing essential for the development of a given
	field. However, standard processing techniques based on statistical clues and
	keyword searches have clear limitations. Instead, we propose a knowledge-based
	processing pipeline which overcomes most of the limitations of these
	techniques. This, in turn, enables direct comparison across texts in different
	languages without the need of translation. In this paper we show the potential
	of this approach for semantically indexing multilingual text collections in the
	history domain. In our experiments we used a version of the Bible translated in
	four different languages, evaluating the precision of our semantic indexing
	pipeline and showing its reliability on the cross-lingual text retrieval task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>raganato-EtAl:2016:LT4DH</bibkey>
  </paper>

  <paper id="4020">
    <title>Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work</title>
    <author><first>J&#246;rg</first><last>Tiedemann</last></author>
    <author><first>Johanna</first><last>Nichols</last></author>
    <author><first>Ronald</first><last>Sprouse</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>148&#8211;155</pages>
    <url>http://aclweb.org/anthology/W16-4020</url>
    <abstract>This paper presents on-going work on creating NLP tools for under-resourced
	languages from very sparse training data coming from linguistic field work. In
	this work, we focus on Ingush, a  Nakh-Daghestanian language spoken by about
	300,000 people in the Russian republics Ingushetia and Chechnya. We present
	work on morphosyntactic taggers trained on transcribed and linguistically
	analyzed recordings and dependency parsers using English glosses to project
	annotation for creating synthetic treebanks. Our preliminary results are
	promising, supporting the goal of bootstrapping efficient NLP tools with
	limited
	or no task-specific annotated data resources available.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tiedemann-nichols-sprouse:2016:LT4DH</bibkey>
  </paper>

  <paper id="4021">
    <title>The MultiTal NLP tool infrastructure</title>
    <author><first>Driss</first><last>Sadoun</last></author>
    <author><first>Satenik</first><last>Mkhitaryan</last></author>
    <author><first>Damien</first><last>Nouvel</last></author>
    <author><first>Mathieu</first><last>Valette</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>156&#8211;163</pages>
    <url>http://aclweb.org/anthology/W16-4021</url>
    <abstract>This paper gives an overview of the MultiTal project, which aims to create a
	research infrastructure that ensures long-term distribution of NLP tools
	descriptions. The goal is to make NLP tools more accessible and usable to
	end-users of different disciplines. 
	The infrastructure is built on a meta-data scheme modelling and standardising
	multilingual NLP tools documentation. The model is conceptualised using an OWL
	ontology. The formal representation of the ontology allows us to automatically
	generate organised and structured documentation in different languages for each
	represented tool.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sadoun-EtAl:2016:LT4DH</bibkey>
  </paper>

  <paper id="4022">
    <title>Tools and Instruments for Building and Querying Diachronic Computational Lexica</title>
    <author><first>Fahad</first><last>Khan</last></author>
    <author><first>Andrea</first><last>Bellandi</last></author>
    <author><first>Monica</first><last>Monachini</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>164&#8211;171</pages>
    <url>http://aclweb.org/anthology/W16-4022</url>
    <abstract>This article describes work on enabling the addition of temporal information to
	senses of words in linguistic linked open data lexica based on the lemonDia
	model. Our contribution in this article is twofold. On the one hand, we
	demonstrate how lemonDia enables the querying of diachronic lexical datasets
	using OWL-oriented Semantic Web based technologies. On the other hand, we
	present a preliminary version of an interactive interface intended to help
	users in creating lexical datasets that model meaning change over time.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>khan-bellandi-monachini:2016:LT4DH</bibkey>
  </paper>

  <paper id="4023">
    <title>Tracking Words in Chinese Poetry of Tang and Song Dynasties with the China Biographical Database</title>
    <author><first>Chao-Lin</first><last>Liu</last></author>
    <author><first>Kuo-Feng</first><last>Luo</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>172&#8211;180</pages>
    <url>http://aclweb.org/anthology/W16-4023</url>
    <abstract>(This is the abstract for the submission.)
	Large-scale comparisons between the poetry of Tang and Song dynasties shed
	light on how words and expressions were used and shared among the poets. That
	some words were used only in the Tang poetry and some only in the Song poetry
	could lead to interesting research in linguistics. That the most frequent
	colors are different in the Tang and Song poetry provides a trace of the
	changing social circumstances in the dynasties. Results of the current work
	link to research topics of lexicography, semantics, and social transitions. We
	discuss our findings and present our algorithms for efficient comparisons among
	the poems, which are crucial for completing billion times of comparisons within
	acceptable time.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>liu-luo:2016:LT4DH</bibkey>
  </paper>

  <paper id="4024">
    <title>Using TEI for textbook research</title>
    <author><first>Lena-Luise</first><last>Stahn</last></author>
    <author><first>Steffen</first><last>Hennicke</last></author>
    <author><first>Ernesto William</first><last>De Luca</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>181&#8211;186</pages>
    <url>http://aclweb.org/anthology/W16-4024</url>
    <abstract>The following paper describes the first steps in the development of an ontology
	for the textbook research discipline. The aim of the project WorldViews is to
	establish a digital edition focussing on views of the world depicted in
	textbooks. For this purpose an initial TEI profile has been formalised and
	tested as a use case to enable the semantical encoding of the resource
	'textbook'. This profile shall provide a basic data model describing major
	facets of the textbook's structure relevant to historians.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stahn-hennicke-deluca:2016:LT4DH</bibkey>
  </paper>

  <paper id="4025">
    <title>Web services and data mining: combining linguistic tools for Polish with an analytical platform</title>
    <author><first>Maciej</first><last>Ogrodniczuk</last></author>
    <booktitle>Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</booktitle>
    <month>December</month>
    <year>2016</year>
    <address>Osaka, Japan</address>
    <publisher>The COLING 2016 Organizing Committee</publisher>
    <pages>187&#8211;195</pages>
    <url>http://aclweb.org/anthology/W16-4025</url>
    <abstract>In this paper we present a new combination of existing language tools for
	Polish with a popular data mining platform intended to help researchers from
	digital humanities perform computational analyses without any programming. The
	toolset includes RapidMiner Studio, a software solution offering graphical
	setup of integrated analytical processes and Multiservice, a Web service
	offering access to several state-of-the-art linguistic tools for Polish. The
	setting is verified in a simple task of counting frequencies of unknown words
	in a small corpus.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ogrodniczuk:2016:LT4DH</bibkey>
  </paper>

</volume>

