<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="8100" href="http://doi.org/10.26615/978-954-452-046-5_">
    <title>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</title>
    <editor>Anca Dinu</editor>
    <editor>Petya Osenova</editor>
    <editor>Cristina Vertan</editor>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <doi>0.26615/978-954-452-046-5_</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_</url>
    <bibtype>book</bibtype>
    <bibkey>LT4DH-CEE:2017</bibkey>
  </paper>

  <paper id="8101" href="http://doi.org/10.26615/978-954-452-046-5_001">
    <title>A Diachronic Corpus for Romanian (RoDia)</title>
    <author><first>Ludmila</first><last>Malahov</last></author>
    <author><first>C&#x103;t&#x103;lina</first><last>M&#x103;r&#x103;nduc</last></author>
    <author><first>Alexandru</first><last>Colesnicov</last></author>
    <booktitle>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <pages>1&#8211;9</pages>
    <doi>0.26615/978-954-452-046-5_001</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_001</url>
    <abstract>This paper describes a Romanian Dependency Treebank, built at the Al. I. Cuza
	University (UAIC), and a special OCR techniques used to build it. The corpus
	has rich morphological and syntactic annotation. There are few annotated
	representative corpora in Romanian, and the existent ones are mainly focused on
	the contemporary Romanian standard. The corpus described below is focused on
	the non-standard aspects of the language, the Regional and the Old Romanian.
	Having the intention to participate at the PROIEL project, which aligns oldest
	New Testaments, we annotate the first printed Romanian New Testament (Alba
	Iulia, 1648). We began by applying the UAIC tools for the morphological and
	syntactic processing of Contemporary Romanian over the book’s first quarter
	(second edition). By carefully manually correcting the result of the automated
	annotation (having a modest accuracy) we obtained a sub-corpus for the training
	of tools for the Old Romanian processing. But the first edition of the New
	Testament is written in Cyrillic letters. The existence of books printed in the
	Old Cyrillic alphabet is a common problem for Romania and The Republic of
	Moldova, countries where the Romanian is spoken; a problem to solve by the
	joint efforts of the NLP researchers in the two countries.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>malahov-mvarvanduc-colesnicov:2017:LT4DH-CEE</bibkey>
  </paper>

  <paper id="8102" href="http://doi.org/10.26615/978-954-452-046-5_002">
    <title>Tools for Building a Corpus to Study the Historical and Geographical Variation of the Romanian Language</title>
    <author><first>Victoria</first><last>Bobicev</last></author>
    <author><first>C&#x103;t&#x103;lina</first><last>M&#x103;r&#x103;nduc</last></author>
    <author><first>Cenel Augusto</first><last>Perez</last></author>
    <booktitle>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <pages>10&#8211;19</pages>
    <doi>0.26615/978-954-452-046-5_002</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_002</url>
    <abstract>Contemporary standard language corpora are ideal for NLP. There are few
	morphologically and syntactically annotated corpora for Romanian, and those
	existing or in progress only deal with the Contemporary Romanian standard.
	However, the necessity to study the dynamics of natural languages gave rise to
	balanced corpora, containing non-standard texts. In this paper, we describe the
	creation of tools for processing non-standard Romanian to build a big balanced
	corpus. We want to preserve in annotated form as many early stages of language
	as possible. We have already built a corpus in Old Romanian. We also intend to
	include the South-Danube dialects, remote to the standard language, along with
	regional forms closer to the standard. We try to preserve data about endangered
	idioms such as Aromanian, Meglenoromanian and Istroromanian dialects, and
	calculate the distance between different regional variants, including the
	language spoken in the Republic of Moldova. This distance, as well as the
	mutual understanding between the speakers, is the correct criterion for the
	classification of idioms as different languages, or as dialects, or as regional
	variants close to the standard.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bobicev-mvarvanduc-perez:2017:LT4DH-CEE</bibkey>
  </paper>

  <paper id="8103" href="http://doi.org/10.26615/978-954-452-046-5_003">
    <title>Multilingual Ontologies for the Representation and Processing of Folktales</title>
    <author><first>Thierry</first><last>Declerck</last></author>
    <author><first>Anastasija</first><last>Aman</last></author>
    <author><first>Martin</first><last>Banzer</last></author>
    <author><first>Dominik</first><last>Mach&#225;&#x10D;ek</last></author>
    <author><first>Lisa</first><last>Sch&#228;fer</last></author>
    <author><first>Natalia</first><last>Skachkova</last></author>
    <booktitle>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <pages>20&#8211;23</pages>
    <doi>0.26615/978-954-452-046-5_003</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_003</url>
    <abstract>We describe work done in the field of folkloristics and consisting in creating
	ontologies based on well-established studies proposed by "classical"
	folklorists. This work is supporting the availability of a huge amount of
	digital and structured knowledge on folktales to digital humanists. The
	ontological encoding of past and current motif-indexation and classification
	systems for folktales was in the first step limited to English language data.
	This led us to focus on making those newly generated formal knowledge sources
	available in a few more languages, like German, Russian and Bulgarian. We
	stress the importance of achieving this multilingual extension of our
	ontologies at a larger scale, in order for example to support the automated
	analysis and classification of such narratives in a large variety of languages,
	as those are getting more and more accessible on the Web.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>declerck-EtAl:2017:LT4DH-CEE</bibkey>
  </paper>

  <paper id="8104" href="http://doi.org/10.26615/978-954-452-046-5_004">
    <title>On the annotation of vague expressions: a case study on Romanian historical texts</title>
    <author><first>Anca</first><last>Dinu</last></author>
    <author><first>Walther</first><last>von Hahn</last></author>
    <author><first>Cristina</first><last>Vertan</last></author>
    <booktitle>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <pages>24&#8211;31</pages>
    <doi>0.26615/978-954-452-046-5_004</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_004</url>
    <abstract>Current approaches in Digital .Humanities tend to ignore a central as-pect of
	any hermeneutic introspection: the intrinsic vagueness of analyzed texts.
	Especially when dealing with his-torical documents neglecting vague-ness has
	important implications on the interpretation of the results. In this pa-per we
	present current limitation of an-notation approaches and describe a current
	methodology for annotating vagueness for historical Romanian texts.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dinu-vonhahn-vertan:2017:LT4DH-CEE</bibkey>
  </paper>

  <paper id="8105" href="http://doi.org/10.26615/978-954-452-046-5_005">
    <title>Language Technologies in Teaching Bugarian at Primary and Secondary School Level: the NBU Platform of Language Teaching (PLT)</title>
    <author><first>Maria</first><last>Stambolieva</last></author>
    <author><first>Valentina</first><last>Ivanova</last></author>
    <author><first>Mariana</first><last>Raykova</last></author>
    <author><first>Milka</first><last>Hadjikoteva</last></author>
    <author><first>Mariya</first><last>Neykova</last></author>
    <booktitle>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <pages>32&#8211;38</pages>
    <doi>0.26615/978-954-452-046-5_005</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_005</url>
    <abstract>The NBU Language Teaching Platform (PLT) was initially designed for teaching
	foreign languages for specific purposes; at a second stage, some of its
	functionalities were extended to answer the needs of teaching general foreign
	language. New functionalities have now been created for the purpose of
	providing e-support for Bulgarian language and literature teaching at primary
	and secondary school level. The article presents the general structure of the
	platform and the functionalities specifically developed to match the standards
	and expected results set by the Ministry of Education.
	   The E-platform integrates: 1/ an environment for creating, organizing and
	maintaining electronic text archives, for extracting text corpora and aligning
	corpora; 2/ a linguistic database; 3/ a concordancer; 4/ a set of modules for
	the generation and editing of practice exercises for each text or corpus; 5/
	functionalities for export from the platform and import to other educational
	platforms. For Moodle, modules were created for test generation, performance
	assessment and feedback.
	   The PLT allows centralized presentation of abundant teaching content,
	control of the educational process, fast and reliable feedback on performance.
	Author{1}Affiliation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stambolieva-EtAl:2017:LT4DH-CEE</bibkey>
  </paper>

  <paper id="8106" href="http://doi.org/10.26615/978-954-452-046-5_006">
    <title>NATURAL LANGUAGE PROCESSING IN POLITICAL CAMPAIGNS</title>
    <author><first>Cristina</first><last>Moise</last></author>
    <booktitle>Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Varna</address>
    <publisher>INCOMA Inc.</publisher>
    <pages>39&#8211;43</pages>
    <doi>0.26615/978-954-452-046-5_006</doi>
    <url>http://doi.org/10.26615/978-954-452-046-5_006</url>
    <abstract>This paper overviews the Majoritas ecosystem, providing a complete overview of
	political campaigns assessment aimed to assist politicians and their staff in
	delivering consistent and personalized message within social media.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>moise:2017:LT4DH-CEE</bibkey>
  </paper>

</volume>

