Stefan Evert


2021

pdf bib
FAST: A carefully sampled and cognitively motivated dataset for distributional semantic evaluation
Stefan Evert | Gabriella Lapesa
Proceedings of the 25th Conference on Computational Natural Language Learning

What is the first word that comes to your mind when you hear giraffe, or damsel, or freedom? Such free associations contain a huge amount of information on the mental representations of the corresponding concepts, and are thus an extremely valuable testbed for the evaluation of semantic representations extracted from corpora. In this paper, we present FAST (Free ASsociation Tasks), a free association dataset for English rigorously sampled from two standard free association norms collections (the Edinburgh Associative Thesaurus and the University of South Florida Free Association Norms), discuss two evaluation tasks, and provide baseline results. In parallel, we discuss methodological considerations concerning the desiderata for a proper evaluation of semantic representations.

2020

pdf bib
Corpus Query Lingua Franca part II: Ontology
Stefan Evert | Oleg Harlamov | Philipp Heinrich | Piotr Banski
Proceedings of the Twelfth Language Resources and Evaluation Conference

The present paper outlines the projected second part of the Corpus Query Lingua Franca (CQLF) family of standards: CQLF Ontology, which is currently in the process of standardization at the International Standards Organization (ISO), in its Technical Committee 37, Subcommittee 4 (TC37SC4) and its national mirrors. The first part of the family, ISO 24623-1 (henceforth CQLF Metamodel), was successfully adopted as an international standard at the beginning of 2018. The present paper reflects the state of the CQLF Ontology at the moment of submission for the Committee Draft ballot. We provide a brief overview of the CQLF Metamodel, present the assumptions and aims of the CQLF Ontology, its basic structure, and its potential extended applications. The full ontology is expected to emerge from a community process, starting from an initial version created by the authors of the present paper.

pdf bib
EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus
Thomas Proisl | Natalie Dykes | Philipp Heinrich | Besim Kabashi | Andreas Blombach | Stefan Evert
Proceedings of the Twelfth Language Resources and Evaluation Conference

The EmpiriST corpus (Beißwenger et al., 2016) is a manually tokenized and part-of-speech tagged corpus of approximately 23,000 tokens of German Web and CMC (computer-mediated communication) data. We extend the corpus with manually created annotation layers for word form normalization, lemmatization and lexical semantics. All annotations have been independently performed by multiple human annotators. We report inter-annotator agreements and results of baseline systems and state-of-the-art off-the-shelf tools.

2018

pdf bib
Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods
Thomas Proisl | Stefan Evert | Fotis Jannidis | Christof Schöch | Leonard Konle | Steffen Pielström
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
EmotiKLUE at IEST 2018: Topic-Informed Classification of Implicit Emotions
Thomas Proisl | Philipp Heinrich | Besim Kabashi | Stefan Evert
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

EmotiKLUE is a submission to the Implicit Emotion Shared Task. It is a deep learning system that combines independent representations of the left and right contexts of the emotion word with the topic distribution of an LDA topic model. EmotiKLUE achieves a macro average F₁score of 67.13%, significantly outperforming the baseline produced by a simple ML classifier. Further enhancements after the evaluation period lead to an improved F₁score of 68.10%.

2017

pdf bib
Large-scale evaluation of dependency-based DSMs: Are they worth the effort?
Gabriella Lapesa | Stefan Evert
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a large-scale evaluation study of dependency-based distributional semantic models. We evaluate dependency-filtered and dependency-structured DSMs in a number of standard semantic similarity tasks, systematically exploring their parameter space in order to give them a “fair shot” against window-based models. Our results show that properly tuned window-based DSMs still outperform the dependency-based models in most tasks. There appears to be little need for the language-dependent resources and computational cost associated with syntactic analysis.

2016

pdf bib
Proceedings of the 10th Web as Corpus Workshop
Paul Cook | Stefan Evert | Roland Schäfer | Egon Stemle
Proceedings of the 10th Web as Corpus Workshop

pdf bib
EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora
Michael Beißwenger | Sabine Bartsch | Stefan Evert | Kay-Michael Würzner
Proceedings of the 10th Web as Corpus Workshop

pdf bib
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)
Michael Zock | Alessandro Lenci | Stefan Evert
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

pdf bib
The CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations
Enrico Santus | Anna Gladkova | Stefan Evert | Alessandro Lenci
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The shared task of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) aims at providing a common benchmark for testing current corpus-based methods for the identification of lexical semantic relations (synonymy, antonymy, hypernymy, part-whole meronymy) and at gaining a better understanding of their respective strengths and weaknesses. The shared task uses a challenging dataset extracted from EVALution 1.0, which contains word pairs holding the above-mentioned relations as well as semantically unrelated control items (random). The task is split into two subtasks: (i) identification of related word pairs vs. unrelated ones; (ii) classification of the word pairs according to their semantic relation. This paper describes the subtasks, the dataset, the evaluation metrics, the seven participating systems and their results. The best performing system in subtask 1 is GHHH (F1 = 0.790), while the best system in subtask 2 is LexNet (F1 = 0.445). The dataset and the task description are available at https://sites.google.com/site/cogalex2016/home/shared-task.

pdf bib
CogALex-V Shared Task: Mach5 – A traditional DSM approach to semantic relatedness
Stefan Evert
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

This contribution provides a strong baseline result for the CogALex-V shared task using a traditional “count”-type DSM (placed in rank 2 out of 7 in subtask 1 and rank 3 out of 6 in subtask 2). Parameter tuning experiments reveal some surprising effects and suggest that the use of random word pairs as negative examples may be problematic, guiding the parameter optimization in an undesirable direction.

2015

pdf bib
SemantiKLUE: Semantic Textual Similarity with Maximum Weight Matching
Nataliia Plotnikova | Gabriella Lapesa | Thomas Proisl | Stefan Evert
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
KLUEless: Polarity Classification and Association
Nataliia Plotnikova | Micha Kohl | Kevin Volkert | Stefan Evert | Andreas Lerner | Natalie Dykes | Heiko Ermer
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Towards a better understanding of Burrows’s Delta in literary authorship attribution
Stefan Evert | Thomas Proisl | Thorsten Vitt | Christof Schöch | Fotis Jannidis | Steffen Pielström
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

2014

pdf bib
Distributional Semantics in R with the wordspace Package
Stefan Evert
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection
Gabriella Lapesa | Stefan Evert
Transactions of the Association for Computational Linguistics, Volume 2

This paper presents the results of a large-scale evaluation study of window-based Distributional Semantic Models on a wide variety of tasks. Our study combines a broad coverage of model parameters with a model selection methodology that is robust to overfitting and able to capture parameter interactions. We show that our strategy allows us to identify parameter configurations that achieve good performance across different datasets and tasks.

pdf bib
Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models
Gabriella Lapesa | Stefan Evert | Sabine Schulte im Walde
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

pdf bib
SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching
Thomas Proisl | Stefan Evert | Paul Greiner | Besim Kabashi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
SentiKLUE: Updating a Polarity Classifier in 48 Hours
Stefan Evert | Thomas Proisl | Paul Greiner | Besim Kabashi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
SNAP: A Multi-Stage XML-Pipeline for Aspect Based Sentiment Analysis
Clemens Schulze Wettendorf | Robin Jegan | Allan Körner | Julia Zerche | Nataliia Plotnikova | Julian Moreth | Tamara Schertl | Verena Obermeyer | Susanne Streil | Tamara Willacker | Stefan Evert
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Proceedings of the 10th Workshop on Multiword Expressions (MWE)
Valia Kordoni | Markus Egg | Agata Savary | Eric Wehrli | Stefan Evert
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf bib
NaDiR: Naive Distributional Response Generation
Gabriella Lapesa | Stefan Evert
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

2013

pdf bib
KLUE-CORE: A regression model of semantic textual similarity
Paul Greiner | Thomas Proisl | Stefan Evert | Besim Kabashi
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
KLUE: Simple and robust methods for polarity classification
Thomas Proisl | Paul Greiner | Stefan Evert | Besim Kabashi
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Evaluating Neighbor Rank and Distance Measures as Predictors of Semantic Priming
Gabriella Lapesa | Stefan Evert
Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL)

2010

pdf bib
Distributional Semantic Models
Stefan Evert
NAACL HLT 2010 Tutorial Abstracts

pdf bib
Google Web 1T 5-Grams Made Easy (but not for the computer)
Stefan Evert
Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop

2008

pdf bib
A Lightweight and Efficient Tool for Cleaning Web Pages
Stefan Evert
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Originally conceived as a “naïve” baseline experiment using traditional n-gram language models as classifiers, the NCleaner system has turned out to be a fast and lightweight tool for cleaning Web pages with state-of-the-art accuracy (based on results from the CLEANEVAL competition held in 2007). Despite its simplicity, the algorithm achieves a significant improvement over the baseline (i.e. plain, uncleaned text dumps), trading off recall for substantially higher precision. NCleaner is available as an open-source software package. It is pre-configured for English Web pages, but can be adapted to other languages with minimal amounts of manually cleaned training data. Since NCleaner does not make use of HTML structure, it can also be applied to existing Web corpora that are only available in plain text format, with a minor loss in classfication accuracy.

2007

pdf bib
Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling
Marco Baroni | Stefan Evert
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
zipfR: Word Frequency Modeling in R
Stefan Evert | Marco Baroni
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Proceedings of the Workshop on A Broader Perspective on Multiword Expressions
Nicole Gregoire | Stefan Evert | Su Nam Kim
Proceedings of the Workshop on A Broader Perspective on Multiword Expressions

2006

pdf bib
Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Begoña Villada Moirón | Aline Villavicencio | Diana McCarthy | Stefan Evert | Suzanne Stevenson
Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties

2004

pdf bib
Significance tests for the evaluation of ranking methods
Stefan Evert
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Identifying Morphosyntactic Preferences in Collocations
Stefan Evert | Ulrich Heid | Kristina Spranger
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
The Statistical Analysis of Morphosyntactic Distributions
Stefan Evert
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Experiments on Candidate Data for Collocation Extraction
Stefan Evert | Hannah Kermes
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
YAC - A Recursive Chunker for Unrestricted German Text
Hannah Kermes | Stefan Evert
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Advanced Tools for the Study of Natural Interactivity
Claudia Soria | Niels Ole Bernsen | Niels Cadée | Jean Carletta | Laila Dybkjær | Stefan Evert | Ulrich Heid | Amy Isard | Mykola Kolodnytsky | Christoph Lauer | Wolfgang Lezius | Lucas P.J.J. Noldus | Vito Pirrelli | Norbert Reithinger | Andreas Vögele
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Methods for the Qualitative Evaluation of Lexical Association Measures
Stefan Evert | Brigitte Krenn
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics