<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="1400">
    <title>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</title>
    <editor>Toma&#x17E; Erjavec</editor>
    <editor>Jakub Piskorski</editor>
    <editor>Lidia Pivovarova</editor>
    <editor>Jan &#x160;najder</editor>
    <editor>Josef Steinberger</editor>
    <editor>Roman Yangarber</editor>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-14</url>
    <bibtype>book</bibtype>
    <bibkey>BSNLP:2017</bibkey>
  </paper>

  <paper id="1401">
    <title>Toward Pan-Slavic NLP: Some Experiments with Language Adaptation</title>
    <author><first>Serge</first><last>Sharoff</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;2</pages>
    <url>http://www.aclweb.org/anthology/W17-1401</url>
    <abstract>There is great variation in the amount of NLP resources available for Slavonic
	languages. For example, the Universal Dependency treebank (Nivre et al., 2016)
	has about 2 MW of training resources for Czech, more than 1 MW for Russian,
	while only 950 words for Ukrainian and nothing for Belorussian, Bosnian or
	Macedonian. Similarly, the Autodesk Machine Translation dataset only covers
	three Slavonic languages (Czech, Polish and Russian). In this talk I will
	discuss a general approach, which can be called Language Adaptation, similarly
	to Domain Adaptation. In this approach, a model for a particular language
	processing task is built by lexical transfer of cognate words and by learning a
	new feature representation for a lesser-resourced (recipient) language starting
	from a better-resourced (donor) language. More specifically, I will demonstrate
	how language adaptation works in such training scenarios as Translation Quality
	Estimation, Part-of-Speech tagging and Named Entity Recognition.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sharoff:2017:BSNLP</bibkey>
  </paper>

  <paper id="1402">
    <title>Clustering of Russian Adjective-Noun Constructions using Word Embeddings</title>
    <author><first>Andrey</first><last>Kutuzov</last></author>
    <author><first>Elizaveta</first><last>Kuzmenko</last></author>
    <author><first>Lidia</first><last>Pivovarova</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>3&#8211;13</pages>
    <url>http://www.aclweb.org/anthology/W17-1402</url>
    <abstract>This paper presents a method of automatic construction extraction from a large
	corpus of Russian. The term `construction' here means a multi-word expression
	in which a variable can be replaced with another word from the same semantic
	class, for example, `a glass of [water/juice/milk]'. We deal with constructions
	that consist of a noun and its adjective modifier. We propose a method of
	grouping such constructions into semantic classes via 2-step clustering of word
	vectors in distributional models. We compare it with other clustering
	techniques and evaluate it against A Russian-English Collocational Dictionary
	of the Human Body that contains manually annotated groups of constructions with
	nouns meaning human body parts.
	The best performing method is used to cluster all adjective-noun bigrams in the
	Russian National Corpus. Results of this procedure are publicly available and
	can be used for building Russian construction dictionary as well as to
	accelerate theoretical studies of constructions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kutuzov-kuzmenko-pivovarova:2017:BSNLP</bibkey>
  </paper>

  <paper id="1403">
    <title>A Preliminary Study of Croatian Lexical Substitution</title>
    <author><first>Domagoj</first><last>Alagi&#x107;</last></author>
    <author><first>Jan</first><last>&#x160;najder</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>14&#8211;19</pages>
    <url>http://www.aclweb.org/anthology/W17-1403</url>
    <abstract>Lexical substitution is a task of determining a meaning-preserving replacement
	for a word in context. We report on a preliminary study of this task for the
	Croatian language on a small-scale lexical sample dataset, manually annotated
	using three different annotation schemes. We compare the annotations, analyze
	the inter-annotator agreement, and observe a number of interesting language
	specific details in the obtained lexical substitutes. Furthermore, we apply a
	recently-proposed, dependency-based lexical substitution model to our dataset. 
	The model achieves a P$@$3 score of 0.35, which indicates the difficulty of the
	task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>alagic-vsnajder:2017:BSNLP</bibkey>
  </paper>

  <paper id="1404">
    <title>Projecting Multiword Expression Resources on a Polish Treebank</title>
    <author><first>Agata</first><last>Savary</last></author>
    <author><first>Jakub</first><last>Waszczuk</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>20&#8211;26</pages>
    <url>http://www.aclweb.org/anthology/W17-1404</url>
    <abstract>Multiword expressions (MWEs) are linguistic objects containing two or more
	words and showing idiosyncratic behavior at different levels. Treebanks with
	annotated MWEs enable studies of such properties, as well as training and
	evaluation of MWE-aware parsers. However, few treebanks contain full-fledged
	MWE annotations. We show how this gap can be bridged in Polish by projecting 3
	MWE resources on a constituency treebank.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>savary-waszczuk:2017:BSNLP</bibkey>
  </paper>

  <paper id="1405">
    <title>Lexicon Induction for Spoken Rusyn &#8211; Challenges and Results</title>
    <author><first>Achim</first><last>Rabus</last></author>
    <author><first>Yves</first><last>Scherrer</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>27&#8211;32</pages>
    <url>http://www.aclweb.org/anthology/W17-1405</url>
    <abstract>This paper reports on challenges and results in developing NLP resources for
	spoken Rusyn. Being a Slavic minority language, Rusyn does not have any
	resources to make use of. We propose to build a morphosyntactic dictionary for
	Rusyn, combining existing resources from the etymologically close Slavic
	languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to
	Rusyn by using vowel-sensitive Levenshtein distance, hand-written
	language-specific transformation rules, and combinations of the two. Compared
	to an exact match baseline, we increase the coverage of the resulting
	morphological dictionary by up to 77.4% relative (42.9% absolute), which
	results in a tagging recall increased by 11.6% relative (9.1% absolute). Our
	research confirms and expands the results of previous studies showing the
	efficiency of using NLP resources from neighboring languages for low-resourced
	languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rabus-scherrer:2017:BSNLP</bibkey>
  </paper>

  <paper id="1406">
    <title>The Universal Dependencies Treebank for Slovenian</title>
    <author><first>Kaja</first><last>Dobrovoljc</last></author>
    <author><first>Toma&#x17E;</first><last>Erjavec</last></author>
    <author><first>Simon</first><last>Krek</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>33&#8211;38</pages>
    <url>http://www.aclweb.org/anthology/W17-1406</url>
    <abstract>This paper introduces the Universal Dependencies Treebank for Slovenian. We
	overview the existing dependency treebanks for Slovenian and then detail the
	conversion of the ssj200k treebank to the framework of Universal Dependencies
	version 2. We explain the mapping of part-of-speech categories, morphosyntactic
	features, and the dependency relations, focusing on the more problematic
	language-specific issues. We conclude with a quantitative overview of the
	treebank and directions for further work.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dobrovoljc-erjavec-krek:2017:BSNLP</bibkey>
  </paper>

  <paper id="1407">
    <title>Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages</title>
    <author><first>Tanja</first><last>Samard&#x17E;i&#x107;</last></author>
    <author><first>Mirjana</first><last>Starovi&#x107;</last></author>
    <author><first>&#x17D;eljko</first><last>Agi&#x107;</last></author>
    <author><first>Nikola</first><last>Ljube&#x161;i&#x107;</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>39&#8211;44</pages>
    <url>http://www.aclweb.org/anthology/W17-1407</url>
    <abstract>The paper documents the procedure of building a new Universal Dependencies
	(UDv2) treebank for Serbian starting from an existing Croatian UDv1 treebank
	and taking into account the other Slavic UD annotation guidelines. We describe
	the automatic and manual annotation procedures, discuss the annotation of
	Slavic-specific categories (case governing quantifiers, reflexive pronouns,
	question particles) and propose an approach to handling deverbal nouns in
	Slavic languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>samardvzic-EtAl:2017:BSNLP</bibkey>
  </paper>

  <paper id="1408">
    <title>Spelling Correction for Morphologically Rich Language: a Case Study of Russian</title>
    <author><first>Alexey</first><last>Sorokin</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>45&#8211;53</pages>
    <url>http://www.aclweb.org/anthology/W17-1408</url>
    <abstract>We present an algorithm for automatic correction of spelling errors on the
	sentence level, which uses noisy channel model and feature-based reranking of
	hypotheses. Our system is designed for Russian and clearly outperforms the
	winner of SpellRuEval-2016 competition. We show that language model size has
	the greatest influence on spelling correction quality. We also experiment with
	different types of features and show that morphological and semantic
	information also improves the accuracy of spellchecking.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sorokin:2017:BSNLP</bibkey>
  </paper>

  <paper id="1409">
    <title>Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian</title>
    <author><first>Paula</first><last>Gombar</last></author>
    <author><first>Zoran</first><last>Medi&#x107;</last></author>
    <author><first>Domagoj</first><last>Alagi&#x107;</last></author>
    <author><first>Jan</first><last>&#x160;najder</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>54&#8211;59</pages>
    <url>http://www.aclweb.org/anthology/W17-1409</url>
    <abstract>Sentiment lexicons are widely used as an intuitive and inexpensive way of
	tackling sentiment classification, often within a simple lexicon word-counting
	approach or as part of a  supervised model.  However, it is an open question
	whether these approaches can compete with supervised models that use only
	word-representation features.  We address this question in the context of
	domain-specific sentiment classification for Croatian. We experiment with the
	graph-based acquisition of sentiment lexicons, analyze their quality, and
	investigate how effectively they can be used in sentiment classification.  Our
	results indicate that, even with as few as 500 labeled instances, a supervised
	model substantially outperforms a word-counting model. We also observe that
	adding lexicon-based features does not significantly improve supervised
	sentiment classification.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gombar-EtAl:2017:BSNLP</bibkey>
  </paper>

  <paper id="1410">
    <title>Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text</title>
    <author><first>Nikola</first><last>Ljube&#x161;i&#x107;</last></author>
    <author><first>Toma&#x17E;</first><last>Erjavec</last></author>
    <author><first>Darja</first><last>Fi&#x161;er</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>60&#8211;68</pages>
    <url>http://www.aclweb.org/anthology/W17-1410</url>
    <abstract>In this paper we present the adaptations of a state-of-the-art tagger for South
	Slavic languages to non-standard texts on the example of the Slovene language.
	We investigate the impact of introducing in-domain training data as well as
	additional supervision through external resources or tools like word clusters
	and word normalization. We remove more than half of the error of the standard
	tagger when applied to non-standard texts by training it on a combination of
	standard and non-standard training data, while enriching the data
	representation with external resources removes additional 11 percent of the
	error. The final configuration achieves tagging accuracy of 87.41% on the full
	morphosyntactic description, which is, nevertheless, still quite far from the
	accuracy of 94.27% achieved on standard text.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ljubevsic-erjavec-fivser:2017:BSNLP</bibkey>
  </paper>

  <paper id="1411">
    <title>Comparison of Short-Text Sentiment Analysis Methods for Croatian</title>
    <author><first>Leon</first><last>Rotim</last></author>
    <author><first>Jan</first><last>&#x160;najder</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>69&#8211;75</pages>
    <url>http://www.aclweb.org/anthology/W17-1411</url>
    <abstract>We focus on the task of supervised sentiment classification of short and
	informal texts in Croatian, using two simple yet effective methods: word
	embeddings and string kernels. We investigate whether word embeddings offer any
	advantage over corpus- and preprocessing-free string kernels, and how these
	compare to bag-of-words baselines. We conduct a comparison on three different
	datasets, using different preprocessing methods and kernel functions. Results
	show that, on two out of three datasets, word embeddings outperform string
	kernels, which in turn outperform word and n-gram bag-of-words baselines.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rotim-vsnajder:2017:BSNLP</bibkey>
  </paper>

  <paper id="1412">
    <title>The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in Slavic Languages</title>
    <author><first>Jakub</first><last>Piskorski</last></author>
    <author><first>Lidia</first><last>Pivovarova</last></author>
    <author><first>Jan</first><last>&#x160;najder</last></author>
    <author><first>Josef</first><last>Steinberger</last></author>
    <author><first>Roman</first><last>Yangarber</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>76&#8211;85</pages>
    <url>http://www.aclweb.org/anthology/W17-1412</url>
    <abstract>This paper describes the outcomes of the first challenge on multilingual named
	entity recognition that aimed at recognizing mentions of named entities in web
	documents in Slavic languages, their normalization/lemmatization, and
	cross-language matching. It was organised in the context of the 6th
	Balto-Slavic Natural Language Processing Workshop, co-located with the EACL
	2017 conference. Although eleven teams signed up for the evaluation, due to the
	complexity of the task(s) and short time available for elaborating a solution,
	only two teams submitted results on time. The reported evaluation figures
	reflect the relatively higher level of complexity of named entity-related tasks
	in the context of processing texts in Slavic languages. Since the duration of
	the challenge goes beyond the date of the publication of this paper and updated
	picture of the participating systems and their corresponding performance can be
	found on the web page of the challenge.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>piskorski-EtAl:2017:BSNLP</bibkey>
  </paper>

  <paper id="1413">
    <title>Liner2 — a Generic Framework for Named Entity Recognition</title>
    <author><first>Micha&#x142;</first><last>Marci&#x144;czuk</last></author>
    <author><first>Jan</first><last>Koco&#x144;</last></author>
    <author><first>Marcin</first><last>Oleksy</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>86&#8211;91</pages>
    <url>http://www.aclweb.org/anthology/W17-1413</url>
    <abstract>In the paper we present an adaptation of Liner2 framework to solve the BSNLP
	2017 shared task on multilingual named entity recognition. The tool is tuned to
	recognize and lemmatize named entities for Polish.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>marcinczuk-kocon-oleksy:2017:BSNLP</bibkey>
  </paper>

  <paper id="1414">
    <title>Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation</title>
    <author><first>James</first><last>Mayfield</last></author>
    <author><first>Paul</first><last>McNamee</last></author>
    <author><first>Cash</first><last>Costello</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>92&#8211;96</pages>
    <url>http://www.aclweb.org/anthology/W17-1414</url>
    <abstract>The 2017 shared task at the Balto-Slavic NLP workshop requires identifying
	coarse-grained named entities in seven languages, identifying each entity’s
	base form, and clustering name mentions across the multilingual set of
	documents. The fact that no training data is provided to systems for building
	supervised classifiers further adds to the complexity. To complete the task we
	first use publicly available parallel texts to project named entity recognition
	capability from English to each evaluation language. We ignore entirely the
	subtask of identifying non-inflected forms of names. Finally, we create
	cross-document entity identifiers by clustering named mentions using a
	procedure-based approach.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mayfield-mcnamee-costello:2017:BSNLP</bibkey>
  </paper>

  <paper id="1415">
    <title>Comparison of String Similarity Measures for Obscenity Filtering</title>
    <author><first>Ekaterina</first><last>Chernyak</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>97&#8211;101</pages>
    <url>http://www.aclweb.org/anthology/W17-1415</url>
    <abstract>In this paper we address the problem of filtering obscene lexis in Russian
	texts. We use string similarity measures to find words similar or identical to
	words from a stop list and establish both a test collection and a baseline for
	the task. Our experiments show that a novel string similarity measure based on
	the notion of an annotated suffix tree outperforms some of the other well known
	measures.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chernyak:2017:BSNLP</bibkey>
  </paper>

  <paper id="1416">
    <title>Stylometric Analysis of Parliamentary Speeches: Gender Dimension</title>
    <author><first>Justina</first><last>Mandravickaite</last></author>
    <author><first>Tomas</first><last>Krilavi&#x10D;ius</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>102&#8211;107</pages>
    <url>http://www.aclweb.org/anthology/W17-1416</url>
    <abstract>Relation between gender and language has been studied by many authors, however,
	there is still some uncertainty left regarding gender influence on language
	usage in the professional environment. Often, the studied data sets are too
	small or texts of individual authors are too short in order to capture
	differences of language usage wrt gender successfully. This study draws from a
	larger corpus of speeches transcripts of the Lithuanian Parliament (1990-2013)
	to explore language differences of political debates by gender via stylometric
	analysis. Experimental set up consists of stylistic features that indicate
	lexical style and do not require external linguistic tools, namely the most
	frequent words, in combination with unsupervised machine learning algorithms.
	Results show that gender differences in the language use remain in professional
	environment not only in usage of function words, preferred linguistic
	constructions, but in the presented topics as well.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mandravickaite-krilavivcius:2017:BSNLP</bibkey>
  </paper>

  <paper id="1417">
    <title>Towards Never Ending Language Learning for Morphologically Rich Languages</title>
    <author><first>Kseniya</first><last>Buraya</last></author>
    <author><first>Lidia</first><last>Pivovarova</last></author>
    <author><first>Sergey</first><last>Budkov</last></author>
    <author><first>Andrey</first><last>Filchenkov</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>108&#8211;118</pages>
    <url>http://www.aclweb.org/anthology/W17-1417</url>
    <abstract>This work deals with ontology learning from unstructured Russian text. We
	implement one of components Never Ending Language Learner and introduce the
	algorithm extensions aimed to gather specificity of morphologicaly rich
	free-word-order language. We demonstrate that this method may be successfully
	applied to Russian data. In addition we perform several additional experiments
	comparing different settings of the training process. We demonstrate that
	utilizing of morphological features significantly improves the system precision
	while using of seed patterns helps to improve the coverage.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>buraya-EtAl:2017:BSNLP</bibkey>
  </paper>

  <paper id="1418">
    <title>Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style</title>
    <author><first>Ben</first><last>Verhoeven</last></author>
    <author><first>Iza</first><last>&#x160;krjanec</last></author>
    <author><first>Senja</first><last>Pollak</last></author>
    <booktitle>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>119&#8211;125</pages>
    <url>http://www.aclweb.org/anthology/W17-1418</url>
    <abstract>We present results of the first gender classification experiments on Slovene
	text to our knowledge. Inspired by the TwiSty corpus and experiments (Verhoeven
	et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its
	gender annotations to perform gender classification experiments on Twitter text
	comparing a token-based and a lemma-based approach. We find that the
	token-based approach (92.6% accuracy), containing gender markings related to
	the author, outperforms the lemma-based approach by about 5%. Especially in the
	lemmatized version, we also observe stylistic and content-based differences in
	writing between men (e.g. more profane language, numerals and beer mentions)
	and women (e.g. more pronouns, emoticons and character flooding). Many of our
	findings corroborate previous research on other languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>verhoeven-vskrjanec-pollak:2017:BSNLP</bibkey>
  </paper>

</volume>

