Piek Vossen - ACL Anthology

Piek Vossen

2025

Language Models Lack Temporal Generalization and Bigger is Not Better
Stella Verkijk | Piek Vossen | Pia Sommerauer
Findings of the Association for Computational Linguistics: ACL 2025

This paper presents elaborate testing of various LLMs on their generalization capacities. We finetune six encoder models that have been pretrained with very different data (varying in size, language, and period) on a challenging event detection task in Early Modern Dutch archival texts. Each model is finetuned with 5 seeds on 15 different data splits, resulting in 450 finetuned models. We also pre-train a domain-specific Language Model on the target domain and fine-tune and evaluate it in the same way to provide an upper bound. Our experimental setup allows us to look at underresearched aspects of generalizability, namely i) shifts at multiple places in a modeling pipeline, ii) temporal and crosslingual shifts and iii) generalization over different initializations. The results show that none of the models reaches domain-specific model performance, demonstrating their incapacity to generalize. mBERT reaches highest F1 performance, and is relatively stable over different seeds and datasplits, contrary to XLM-R. We find that contemporary Dutch models do not generalize well to Early Modern Dutch as they underperform compared to crosslingual as well as historical models. We conclude that encoder LLMs lack temporal generalization capacities and that bigger models are not better, since even a model pre-trained with five hundred GPUs on 2.5 terabytes of training data (XLM-R) underperforms considerably compared to our domain-specific model, pre-trained on one GPU and 6 GB of data. All our code, data, and the domain-specific model are openly available.

2024

An Empirical Analysis of Diversity in Argument Summarization
Michiel van der Meer | Piek Vossen | Catholijn M. Jonker | Pradeep K. Murukannaiah
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task—capturing diversity—which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argument summarization task called Key Point Analysis, which shows how these approaches struggle to (1) represent arguments shared by few people, (2) deal with data from various sources, and (3) align with subjectivity in human-provided annotations. We find that both general-purpose LLMs and dedicated KPA models exhibit this behavior, but have complementary strengths. Further, we observe that diversification of training data may ameliorate generalization in zero-shot cases. Addressing diversity in argument summarization requires a mix of strategies to deal with subjectivity.

Unknown Script: Impact of Script on Cross-Lingual Transfer
Wondimagegnhue Tufa | Ilia Markov | Piek Vossen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.

Proceedings of the First Workshop on Reference, Framing, and Perspective @ LREC-COLING 2024
Pia Sommerauer | Tommaso Caselli | Malvina Nissim | Levi Remijnse | Piek Vossen
Proceedings of the First Workshop on Reference, Framing, and Perspective @ LREC-COLING 2024

Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Centuries Old Text
Stella Verkijk | Pia Sommerauer | Piek Vossen
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)

This paper discusses the re-usibility of existing approaches, tools and automatic techniques for the annotation and detection of events in a challenging variant of centuries old Dutch written in the archives of the Dutch East India Company. We describe our annotation process and provide a thorough analysis of different versions of manually annotated data and the first automatic results from two fine-tuned Language Models. Through the analysis of this complete process, the paper studies two things: to what extent we can use NLP theories and tasks formulated for modern English to formulate an annotation task for Early Modern Dutch and to what extent we can use NLP models and tools built for modern Dutch (and other languages) on Early Modern Dutch. We believe these analyses give us insight into how to deal with the large variation language showcases in describing events, and how this variation may differ accross domains. We release the annotation guidelines, annotated data, and code.

2023

Do Differences in Values Influence Disagreements in Online Discussions?
Michiel van der Meer | Piek Vossen | Catholijn M. Jonker | Pradeep K. Murukannaiah
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Disagreements are common in online discussions. Disagreement may foster collaboration and improve the quality of a discussion under some conditions. Although there exist methods for recognizing disagreement, a deeper understanding of factors that influence disagreement is lacking in the literature. We investigate a hypothesis that differences in personal values are indicative of disagreement in online discussions. We show how state-of-the-art models can be used for estimating values in online discussions and how the estimated values can be aggregated into value profiles. We evaluate the estimated value profiles based on human-annotated agreement labels. We find that the dissimilarity of value profiles correlates with disagreement in specific cases. We also find that including value information in agreement prediction improves performance.

Reasoning about Ambiguous Definite Descriptions
Stefan Schouten | Peter Bloem | Ilia Markov | Piek Vossen
Findings of the Association for Computational Linguistics: EMNLP 2023

Natural language reasoning plays an increasingly important role in improving language models’ ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases. Our method includes all information required to resolve the ambiguity in the prompt, which means a model does not require anything but reasoning to do well. We find this to be a challenging task for recent LLMs. Code and data available at: https://github.com/sfschouten/exploiting-ambiguity

A WordNet View on Crosslingual Transformers
Wondimagegnhue Tufa | Lisa Beinborn | Piek Vossen
Proceedings of the 12th Global Wordnet Conference

WordNet is a database that represents relations between words and concepts as an abstraction of the contexts in which words are used. Contextualized language models represent words in contexts but leave the underlying concepts implicit. In this paper, we investigate how different layers of a pre-trained language model shape the abstract lexical relationship toward the actual contextual concept. Can we define the amount of contextualized concept forming needed given the abstracted representation of a word? Specifically, we consider samples of words with different polysemy profiles shared across three languages, assuming that words with a different polysemy profile require a different degree of concept shaping by context. We conduct probing experiments to investigate the impact of prior polysemy profiles on the representation in different layers. We analyze how contextualized models can approximate meaning through context and examine crosslingual interference effects.

Improving Graph-to-Text Generation Using Cycle Training
Fina Polat | Ilaria Tiddi | Paul Groth | Piek Vossen
Proceedings of the 4th Conference on Language, Data and Knowledge

Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting
Lea Krause | Wondimagegnhue Tufa | Selene Baez Santamaria | Angel Daza | Urja Khurana | Piek Vossen
Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)

While the fluency and coherence of Large Language Models (LLMs) in text generation have seen significant improvements, their competency in generating appropriate expressions of uncertainty remains limited.Using a multilingual closed-book QA task and GPT-3.5, we explore how well LLMs are calibrated and express certainty across a diverse set of languages, including low-resource settings. Our results reveal strong performance in high-resource languages but a marked decline in performance in lower-resource languages. Across all, we observe an exaggerated expression of confidence in the model, which does not align with the correctness or likelihood of its responses. Our findings highlight the need for further research into accurate calibration of LLMs especially in a multilingual setting.

2022

Probing the representations of named entities in Transformer-based Language Models
Stefan Schouten | Peter Bloem | Piek Vossen
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

In this work we analyze the named entity representations learned by Transformer-based language models. We investigate the role entities play in two tasks: a language modeling task, and a sequence classification task. For this purpose we collect a novel news topic classification dataset with 12 topics called RefNews-12. We perform two complementary methods of analysis. First, we use diagnostic models allowing us to quantify to what degree entity information is present in the hidden representations. Second, we perform entity mention substitution to measure how substitute-entities with different properties impact model performance. By controlling for model uncertainty we are able to show that entities are identified, and depending on the task, play a measurable role in the model’s predictions. Additionally, we show that the entities’ types alone are not enough to account for this. Finally, we find that the the frequency with which entities occur are important for the masked language modeling task, and that the entities’ distributions over topics are important for topic classification.

Evaluating Agent Interactions Through Episodic Knowledge Graphs
Selene Baez Santamaria | Piek Vossen | Thomas Baier
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge

We present a new method based on episodic Knowledge Graphs (eKGs) for evaluating (multimodal) conversational agents in open domains. This graph is generated by interpreting raw signals during conversation and is able to capture the accumulation of knowledge over time. We apply structural and semantic analysis of the resulting graphs and translate the properties into qualitative measures. We compare these measures with existing automatic and manual evaluation metrics commonly used for conversational agents. Our results show that our Knowledge-Graph-based evaluation provides more qualitative insights into interaction and the agent’s behavior.

The Role of Common Ground for Referential Expressions in Social Dialogues
Jaap Kruijt | Piek Vossen
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

In this paper, we frame the problem of co-reference resolution in dialogue as a dynamic social process in which mentions to people previously known and newly introduced are mixed when people know each other well. We restructured an existing data set for the Friends sitcom as a coreference task that evolves over time, where close friends make reference to other people either part of their common ground (inner circle) or not (outer circle). We expect that awareness of common ground is key in social dialogue in order to resolve references to the inner social circle, whereas local contextual information plays a more important role for outer circle mentions. Our analysis of these references confirms that there are differences in naming and introducing these people. We also experimented with the SpanBERT coreference system with and without fine-tuning to measure whether preceding discourse contexts matter for resolving inner and outer circle mentions. Our results show that more inner circle mentions lead to a decrease in model performance, and that fine-tuning on preceding contexts reduces false negatives for both inner and outer circle mentions but increases the false positives as well, showing that the models overfit on these contexts.

Introducing Frege to Fillmore: A FrameNet Dataset that Captures both Sense and Reference
Levi Remijnse | Piek Vossen | Antske Fokkens | Sam Titarsolej
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This article presents the first output of the Dutch FrameNet annotation tool, which facilitates both referential- and frame annotations of language-independent corpora. On the referential level, the tool links in-text mentions to structured data, grounding the text in the real world. On the frame level, those same mentions are annotated with respect to their semantic sense. This way of annotating not only generates a rich linguistic dataset that is grounded in real-world event instances, but also guides the annotators in frame identification, resulting in high inter-annotator-agreement and consistent annotations across documents and at discourse level, exceeding traditional sentence level annotations of frame elements. Moreover, the annotation tool features a dynamic lexical lookup that increases the development of a cross-domain FrameNet lexicon.

Efficiently and Thoroughly Anonymizing a Transformer Language Model for Dutch Electronic Health Records: a Two-Step Method
Stella Verkijk | Piek Vossen
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Neural Network (NN) architectures are used more and more to model large amounts of data, such as text data available online. Transformer-based NN architectures have shown to be very useful for language modelling. Although many researchers study how such Language Models (LMs) work, not much attention has been paid to the privacy risks of training LMs on large amounts of data and publishing them online. This paper presents a new method for anonymizing a language model by presenting the way in which MedRoBERTa.nl, a Dutch language model for hospital notes, was anonymized. The two-step method involves i) automatic anonymization of the training data and ii) semi-automatic anonymization of the LM’s vocabulary. Adopting the fill-mask task where the model predicts what tokens are most probable in a certain context, it was tested how often the model will predict a name in a context where a name should be. It was shown that it predicts a name-like token 0.2% of the time. Any name-like token that was predicted was never the name originally present in the training data. By explaining how a LM trained on highly private real-world medical data can be published, we hope that more language resources will be published openly and responsibly so the scientific community can profit from them.

Modeling Dutch Medical Texts for Detecting Functional Categories and Levels of COVID-19 Patients
Jenia Kim | Stella Verkijk | Edwin Geleijn | Marieke van der Leeden | Carel Meskers | Caroline Meskers | Sabina van der Veen | Piek Vossen | Guy Widdershoven
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Electronic Health Records contain a lot of information in natural language that is not expressed in the structured clinical data. Especially in the case of new diseases such as COVID-19, this information is crucial to get a better understanding of patient recovery patterns and factors that may play a role in it. However, the language in these records is very different from standard language and generic natural language processing tools cannot easily be applied out-of-the-box. In this paper, we present a fine-tuned Dutch language model specifically developed for the language in these health records that can determine the functional level of patients according to a standard coding framework from the World Health Organization. We provide evidence that our classification performs at a sufficient level to generate patient recovery patterns that can be used in the future to analyse factors that contribute to the rehabilitation of COVID-19 patients and to predict individual patient recovery of functioning.

SeqL at SemEval-2022 Task 11: An Ensemble of Transformer Based Models for Complex Named Entity Recognition Task
Fadi Hassan | Wondimagegnhue Tufa | Guillem Collell | Piek Vossen | Lisa Beinborn | Adrian Flanagan | Kuan Eeik Tan
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents our system used to participate in task 11 (MultiCONER) of the SemEval 2022 competition. Our system ranked fourth place in track 12 (Multilingual) and fifth place in track 13 (Code-Mixed). The goal of track 12 is to detect complex named entities in a multilingual setting, while track 13 is dedicated to detecting complex named entities in a code-mixed setting. Both systems were developed using transformer-based language models. We used an ensemble of XLM-RoBERTa-large and Microsoft/infoxlm-large with a Conditional Random Field (CRF) layer. In addition, we describe the algorithms employed to train our models and our hyper-parameter selection. We furthermore study the impact of different methods to aggregate the outputs of the individual models that compose our ensemble. Finally, we present an extensive analysis of the results and errors.

Annotating Targets of Toxic Language at the Span Level
Baran Barbarestani | Isa Maks | Piek Vossen
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)

In this paper, we discuss an interpretable framework to integrate toxic language annotations. Most data sets address only one aspect of the complex relationship in toxic communication and are inconsistent with each other. Enriching annotations with more details and information is however of great importance in order to develop high-performing and comprehensive explainable language models. Such systems should recognize and interpret both expressions that are toxic as well as expressions that make reference to specific targets to combat toxic language. We therefore created a crowd-annotation task to mark the spans of words that refer to target communities as an extension of the HateXplain data set. We present a quantitative and qualitative analysis of the annotations. We also fine-tuned RoBERTa-base on our data and experimented with different data thresholds to measure their effect on the classification. The F1-score of our best model on the test set is 79%. The annotations are freely available and can be combined with the existing HateXplain annotation to build richer and more complete models.

2021

Proceedings of the 11th Global Wordnet Conference
Piek Vossen | Christiane Fellbaum
Proceedings of the 11th Global Wordnet Conference

Variation in framing as a function of temporal reporting distance
Levi Remijnse | Marten Postma | Piek Vossen
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

In this paper, we measure variation in framing as a function of foregrounding and backgrounding in a co-referential corpus with a range of temporal distance. In one type of experiment, frame-annotated corpora grouped under event types were contrasted, resulting in a ranking of frames with typicality rates. In contrasting between publication dates, a different ranking of frames emerged for documents that are close to or far from the event instance. In the second type of analysis, we trained a diagnostic classifier with frame occurrences in order to let it differentiate documents based on their temporal distance class (close to or far from the event instance). The classifier performs above chance and outperforms models with words.

Batavia asked for advice. Pretrained language models for Named Entity Recognition in historical texts.
Sophie I. Arnoult | Lodewijk Petram | Piek Vossen
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Pretrained language models like BERT have advanced the state of the art for many NLP tasks. For resource-rich languages, one has the choice between a number of language-specific models, while multilingual models are also worth considering. These models are well known for their crosslingual performance, but have also shown competitive in-language performance on some tasks. We consider monolingual and multilingual models from the perspective of historical texts, and in particular for texts enriched with editorial notes: how do language models deal with the historical and editorial content in these texts? We present a new Named Entity Recognition dataset for Dutch based on 17th and 18th century United East India Company (VOC) reports extended with modern editorial notes. Our experiments with multilingual and Dutch pretrained language models confirm the crosslingual abilities of multilingual models while showing that all language models can leverage mixed-variant data. In particular, language models successfully incorporate notes for the prediction of entities in historical texts. We also find that multilingual models outperform monolingual models on our data, but that this superiority is linked to the task at hand: multilingual models lose their advantage when confronted with more semantical tasks.

EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References
Selene Baez Santamaria | Thomas Baier | Taewoon Kim | Lea Krause | Jaap Kruijt | Piek Vossen
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

We present EMISSOR: a platform to capture multimodal interactions as recordings of episodic experiences with explicit referential interpretations that also yield an episodic Knowledge Graph (eKG). The platform stores streams of multiple modalities as parallel signals. Each signal is segmented and annotated independently with interpretation. Annotations are eventually mapped to explicit identities and relations in the eKG. As we ground signal segments from different modalities to the same instance representations, we also ground different modalities across each other. Unique to our eKG is that it accepts different interpretations across modalities, sources and experiences and supports reasoning over conflicting information and uncertainties that may result from multimodal experiences. EMISSOR can record and annotate experiments in virtual and real-world, combine data, evaluate system behavior and their performance for preset goals but also model the accumulation of knowledge and interpretations in the Knowledge Graph as a result of these episodic experiences.

2020

Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement
Pia Sommerauer | Antske Fokkens | Piek Vossen
Proceedings of the 28th International Conference on Computational Linguistics

Semantic annotation tasks contain ambiguity and vagueness and require varying degrees of world knowledge. Disagreement is an important indication of these phenomena. Most traditional evaluation methods, however, critically hinge upon the notion of inter-annotator agreement. While alternative frameworks have been proposed, they do not move beyond agreement as the most important indicator of quality. Critically, evaluations usually do not distinguish between instances in which agreement is expected and instances in which disagreement is not only valid but desired because it captures the linguistic and cognitive phenomena in the data. We attempt to overcome these limitations using the example of a dataset that provides semantic representations for diagnostic experiments on language models. Ambiguity, vagueness, and difficulty are not only highly relevant for this use-case, but also play an important role in other types of semantic annotation tasks. We establish an additional, agreement-independent quality metric based on answer-coherence and evaluate it in comparison to existing metrics. We compare against a gold standard and evaluate on expected disagreement. Despite generally low agreement, annotations follow expected behavior and have high accuracy when selected based on coherence. We show that combining different quality metrics enables a more comprehensive evaluation than relying exclusively on agreement.

Combining Conceptual and Referential Annotation to Study Variation in Framing
Marten Postma | Levi Remijnse | Filip Ilievski | Antske Fokkens | Sam Titarsolej | Piek Vossen
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet

We introduce an annotation tool whose purpose is to gain insights into variation of framing by combining FrameNet annotation with referential annotation. English FrameNet enables researchers to study variation in framing at the conceptual level as well through its packaging in language. We enrich FrameNet annotations in two ways. First, we introduce the referential aspect. Secondly, we annotate on complete texts to encode connections between mentions. As a result, we can analyze the variation of framing for one particular event across multiple mentions and (cross-lingual) documents. We can examine how an event is framed over time and how core frame elements are expressed throughout a complete text. The data model starts with a representation of an event type. Each event type has many incidents linked to it, and each incident has several reference texts describing it as well as structured data about the incident. The user can apply two types of annotations: 1) mappings from expressions to frames and frame elements, 2) reference relations from mentions to events and participants of the structured data.

Large-scale Cross-lingual Language Resources for Referencing and Framing
Piek Vossen | Filip Ilievski | Marten Postma | Antske Fokkens | Gosse Minnema | Levi Remijnse
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this article, we lay out the basic ideas and principles of the project Framing Situations in the Dutch Language. We provide our first results of data acquisition, together with the first data release. We introduce the notion of cross-lingual referential corpora. These corpora consist of texts that make reference to exactly the same incidents. The referential grounding allows us to analyze the framing of these incidents in different languages and across different texts. During the project, we will use the automatically generated data to study linguistic framing as a phenomenon, build framing resources such as lexicons and corpora. We expect to capture larger variation in framing compared to traditional approaches for building such resources. Our first data release, which contains structured data about a large number of incidents and reference texts, can be found at http://dutchframenet.nl/data-releases/.

Annotating Perspectives on Vaccination
Roser Morante | Chantal van Son | Isa Maks | Piek Vossen
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we present the Vaccination Corpus, a corpus of texts related to the online vaccination debate that has been annotated with three layers of information about perspectives: attribution, claims and opinions. Additionally, events related to the vaccination debate are also annotated. The corpus contains 294 documents from the Internet which reflect different views on vaccinations. It has been compiled to study the language of online debates, with the final goal of experimenting with methodologies to extract and contrast perspectives in the framework of the vaccination debate.

A Shared Task of a New, Collaborative Type to Foster Reproducibility: A First Exercise in the Area of Language Science and Technology with REPROLANG2020
António Branco | Nicoletta Calzolari | Piek Vossen | Gertjan Van Noord | Dieter van Uytvanck | João Silva | Luís Gomes | André Moreira | Willem Elbers
Proceedings of the Twelfth Language Resources and Evaluation Conference

n this paper, we introduce a new type of shared task — which is collaborative rather than competitive — designed to support and fosterthe reproduction of research results. We also describe the first event running such a novel challenge, present the results obtained, discussthe lessons learned and ponder on future undertakings.

When to explain: Identifying explanation triggers in human-agent interaction
Lea Krause | Piek Vossen
2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence

With more agents deployed than ever, users need to be able to interact and cooperate with them in an effective and comfortable manner. Explanations have been shown to increase the understanding and trust of a user in human-agent interaction. There have been numerous studies investigating this effect, but they rely on the user explicitly requesting an explanation. We propose a first overview of when an explanation should be triggered and show that there are many instances that would be missed if the agent solely relies on direct questions. For this, we differentiate between direct triggers such as commands or questions and introduce indirect triggers like confusion or uncertainty detection.

2019

Proceedings of the 10th Global Wordnet Conference
Piek Vossen | Christiane Fellbaum
Proceedings of the 10th Global Wordnet Conference

Towards interpretable, data-derived distributional meaning representations for reasoning: A dataset of properties and concepts
Pia Sommerauer | Antske Fokkens | Piek Vossen
Proceedings of the 10th Global Wordnet Conference

This paper proposes a framework for investigating which types of semantic properties are represented by distributional data. The core of our framework consists of relations between concepts and properties. We provide hypotheses on which properties are reflected in distributional data or not based on the type of relation. We outline strategies for creating a dataset of positive and negative examples for various semantic properties, which cannot easily be separated on the basis of general similarity (e.g. fly: seagull, penguin). This way, a distributional model can only distinguish between positive and negative examples through evidence for a target property. Once completed, this dataset can be used to test our hypotheses and work towards data-derived interpretable representations.

2018

Proceedings of the 9th Global Wordnet Conference
Francis Bond | Piek Vossen | Christiane Fellbaum
Proceedings of the 9th Global Wordnet Conference

ReferenceNet: a semantic-pragmatic network for capturing reference relations.
Piek Vossen | Filip Ilievski | Marten Postrma
Proceedings of the 9th Global Wordnet Conference

In this paper, we present ReferenceNet: a semantic-pragmatic network of reference relations between synsets. Synonyms are assumed to be exchangeable in similar contexts and also word embeddings are based on sharing of local contexts represented as vectors. Co-referring words, however, tend to occur in the same topical context but in different local contexts. In addition, they may express different concepts related through topical coherence, and through author framing and perspective. In this paper, we describe how reference relations can be added to WordNet and how they can be acquired. We evaluate two methods of extracting event coreference relations using WordNet relations against a manual annotation of 38 documents within the same topical domain of gun violence. We conclude that precision is reasonable but recall is lower because the WordNet hierarchy does not sufficiently capture the required coherence and perspective relations.

A Deep Dive into Word Sense Disambiguation with LSTM
Minh Le | Marten Postma | Jacopo Urbani | Piek Vossen
Proceedings of the 27th International Conference on Computational Linguistics

LSTM-based language models have been shown effective in Word Sense Disambiguation (WSD). In particular, the technique proposed by Yuan et al. (2016) returned state-of-the-art performance in several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study and analysis of this technique using only openly available datasets (GigaWord, SemCor, OMSTI) and software (TensorFlow). Our study showed that similar results can be obtained with much less data than hinted at by Yuan et al. (2016). Detailed analyses shed light on the strengths and weaknesses of this method. First, adding more unannotated training data is useful, but is subject to diminishing returns. Second, the model can correctly identify both popular and unpopular meanings. Finally, the limited sense coverage in the annotated datasets is a major limitation. All code and trained models are made freely available.

Systematic Study of Long Tail Phenomena in Entity Linking
Filip Ilievski | Piek Vossen | Stefan Schlobach
Proceedings of the 27th International Conference on Computational Linguistics

State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However, these scores should be considered in relation to the properties of the datasets they are evaluated on. Until now, there has not been a systematic investigation of the properties of entity linking datasets and their impact on system performance. In this paper we report on a series of hypotheses regarding the long tail phenomena in entity linking datasets, their interaction, and their impact on system performance. Our systematic study of these hypotheses shows that evaluation datasets mainly capture head entities and only incidentally cover data from the tail, thus encouraging systems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult cases of entity linking among the infrequent candidates of ambiguous forms. With our findings, we hope to inspire future designs of both entity linking systems and evaluation datasets. To support this goal, we provide a list of recommended actions for better inclusion of tail cases.

Measuring the Diversity of Automatic Image Descriptions
Emiel van Miltenburg | Desmond Elliott | Piek Vossen
Proceedings of the 27th International Conference on Computational Linguistics

Automatic image description systems typically produce generic sentences that only make use of a small subset of the vocabulary available to them. In this paper, we consider the production of generic descriptions as a lack of diversity in the output, which we quantify using established metrics and two new metrics that frame image description as a word recall task. This framing allows us to evaluate system performance on the head of the vocabulary, as well as on the long tail, where system performance degrades. We use these metrics to examine the diversity of the sentences generated by nine state-of-the-art systems on the MS COCO data set. We find that the systems trained with maximum likelihood objectives produce less diverse output than those trained with additional adversarial objectives. However, the adversarially-trained models only produce more types from the head of the vocabulary and not the tail. Besides vocabulary-based methods, we also look at the compositional capacity of the systems, specifically their ability to create compound nouns and prepositional phrases of different lengths. We conclude that there is still much room for improvement, and offer a toolkit to measure progress towards the goal of generating more diverse image descriptions.

Scoring and Classifying Implicit Positive Interpretations: A Challenge of Class Imbalance
Chantal van Son | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the 27th International Conference on Computational Linguistics

This paper reports on a reimplementation of a system on detecting implicit positive meaning from negated statements. In the original regression experiment, different positive interpretations per negation are scored according to their likelihood. We convert the scores to classes and report our results on both the regression and classification tasks. We show that a baseline taking the mean score or most frequent class is hard to beat because of class imbalance in the dataset. Our error analysis indicates that an approach that takes the information structure into account (i.e. which information is new or contrastive) may be promising, which requires looking beyond the syntactic and semantic characteristics of negated statements.

Resource Interoperability for Sustainable Benchmarking: The Case of Events
Chantal van Son | Oana Inel | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Don’t Annotate, but Validate: a Data-to-Text Method for Capturing Event Data
Piek Vossen | Filip Ilievski | Marten Postma | Roxane Segers
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

The Circumstantial Event Ontology (CEO) and ECB+/CEO: an Ontology and Corpus for Implicit Causal Relations between Events
Roxane Segers | Tommaso Caselli | Piek Vossen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

SemEval-2018 Task 5: Counting Events and Participants in the Long Tail
Marten Postma | Filip Ilievski | Piek Vossen
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper discusses SemEval-2018 Task 5: a referential quantification task of counting events and participants in local, long-tail news documents with high ambiguity. The complexity of this task challenges systems to establish the meaning, reference and identity across documents. The task consists of three subtasks and spans across three domains. We detail the design of this referential quantification task, describe the participating systems, and present additional analysis to gain deeper insight into their performance.

NewsReader at SemEval-2018 Task 5: Counting events by reasoning over event-centric-knowledge-graphs
Piek Vossen
Proceedings of the 12th International Workshop on Semantic Evaluation

In this paper, we describe the participation of the NewsReader system in the SemEval-2018 Task 5 on Counting Events and Participants in the Long Tail. NewsReader is a generic unsupervised text processing system that detects events with participants, time and place to generate Event Centric Knowledge Graphs (ECKGs). We minimally adapted these ECKGs to establish a baseline performance for the task. We first use the ECKGs to establish which documents report on the same incident and what event mentions are coreferential. Next, we aggregate ECKGs across coreferential mentions and use the aggregated knowledge to answer the questions of the task. Our participation tests the quality of NewsReader to create ECKGs, as well as the potential of ECKGs to establish event identity and reason over the result to answer the task queries.

Meaning_space at SemEval-2018 Task 10: Combining explicitly encoded knowledge with information extracted from word embeddings
Pia Sommerauer | Antske Fokkens | Piek Vossen
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents the two systems submitted by the meaning space team in Task 10 of the SemEval competition 2018 entitled Capturing discriminative attributes. The systems consist of combinations of approaches exploiting explicitly encoded knowledge about concepts in WordNet and information encoded in distributional semantic vectors. Rather than aiming for high performance, we explore which kind of semantic knowledge is best captured by different methods. The results indicate that WordNet glosses on different levels of the hierarchy capture many attributes relevant for this task. In combination with exploiting word embedding similarities, this source of information yielded our best results. Our best performing system ranked 5th out of 13 final ranks. Our analysis yields insights into the different kinds of attributes represented by different sources of knowledge.

Proceedings of the Workshop Events and Stories in the News 2018
Tommaso Caselli | Ben Miller | Marieke van Erp | Piek Vossen | Martha Palmer | Eduard Hovy | Teruko Mitamura | David Caswell | Susan W. Brown | Claire Bonial
Proceedings of the Workshop Events and Stories in the News 2018

Talking about other people: an endless range of possibilities
Emiel van Miltenburg | Desmond Elliott | Piek Vossen
Proceedings of the 11th International Conference on Natural Language Generation

Image description datasets, such as Flickr30K and MS COCO, show a high degree of variation in the ways that crowd-workers talk about the world. Although this gives us a rich and diverse collection of data to work with, it also introduces uncertainty about how the world should be described. This paper shows the extent of this uncertainty in the PEOPLE-domain. We present a taxonomy of different ways to talk about other people. This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people.

2017

Proceedings of the Events and Stories in the News Workshop
Tommaso Caselli | Ben Miller | Marieke van Erp | Piek Vossen | Martha Palmer | Eduard Hovy | Teruko Mitamura | David Caswell
Proceedings of the Events and Stories in the News Workshop

The Circumstantial Event Ontology (CEO)
Roxane Segers | Tommaso Caselli | Piek Vossen
Proceedings of the Events and Stories in the News Workshop

In this paper we describe the ongoing work on the Circumstantial Event Ontology (CEO), a newly developed ontology for calamity events that models semantic circumstantial relations between event classes. The circumstantial relations are designed manually, based on the shared properties of each event class. We discuss and contrast two types of event circumstantial relations: semantic circumstantial relations and episodic circumstantial relations. Further, we show the metamodel and the current contents of the ontology and outline the evaluation of the CEO.

The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction
Tommaso Caselli | Piek Vossen
Proceedings of the Events and Stories in the News Workshop

This paper reports on the Event StoryLine Corpus (ESC) v1.0, a new benchmark dataset for the temporal and causal relation detection. By developing this dataset, we also introduce a new task, the StoryLine Extraction from news data, which aims at extracting and classifying events relevant for stories, from across news documents spread in time and clustered around a single seminal event or topic. In addition to describing the dataset, we also report on three baselines systems whose results show the complexity of the task and suggest directions for the development of more robust systems.

Cross-linguistic differences and similarities in image descriptions
Emiel van Miltenburg | Desmond Elliott | Piek Vossen
Proceedings of the 10th International Conference on Natural Language Generation

Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on the specificity of the descriptions.

Storyteller: Visual Analytics of Perspectives on Rich Text Interpretations
Maarten van Meersbergen | Piek Vossen | Janneke van der Zwaan | Antske Fokkens | Willem van Hage | Inger Leemans | Isa Maks
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Complexity of event data in texts makes it difficult to assess its content, especially when considering larger collections in which different sources report on the same or similar situations. We present a system that makes it possible to visually analyze complex event and emotion data extracted from texts. We show that we can abstract from different data models for events and emotions to a single data model that can show the complex relations in four dimensions. The visualization has been applied to analyze 1) dynamic developments in how people both conceive and express emotions in theater plays and 2) how stories are told from the perspectyive of their sources based on rich event data extracted from news or biographies.

GRaSP: Grounded Representation and Source Perspective
Antske Fokkens | Piek Vossen | Marco Rospocher | Rinke Hoekstra | Willem Robert van Hage
Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017

When people or organizations provide information, they make choices regarding what information they include and how they present it. The combination of these two aspects (the content and stance provided by the source) represents a perspective. Investigating differences in perspective can provide various useful insights in the reliability of information, the way perspectives change over time, shared beliefs among groups of a similar social or political background and contrasts between other groups, etc. This paper introduces GRaSP, a generic framework for modeling perspectives and their sources.

2016

Proceedings of the 8th Global WordNet Conference (GWC)
Christiane Fellbaum | Piek Vossen | Verginica Barbu Mititelu | Corina Forascu
Proceedings of the 8th Global WordNet Conference (GWC)

CILI: the Collaborative Interlingual Index
Francis Bond | Piek Vossen | John P. McCrae | Christiane Fellbaum
Proceedings of the 8th Global WordNet Conference (GWC)

This paper introduces the motivation for and design of the Collaborative InterLingual Index (CILI). It is designed to make possible coordination between multiple loosely coupled wordnet projects. The structure of the CILI is based on the Interlingual index first proposed in the EuroWordNet project with several pragmatic extensions: an explicit open license, definitions in English and links to wordnets in the Global Wordnet Grid.

Open Dutch WordNet
Marten Postma | Emiel van Miltenburg | Roxane Segers | Anneleen Schoen | Piek Vossen
Proceedings of the 8th Global WordNet Conference (GWC)

We describe Open Dutch WordNet, which has been derived from the Cornetto database, the Princeton WordNet and open source resources. We exploited existing equivalence relations between Cornetto synsets and WordNet synsets in order to move the open source content from Cornetto into WordNet synsets. Currently, Open Dutch Wordnet contains 117,914 synsets, of which 51,588 synsets contain at least one Dutch synonym, which leaves 66,326 synsets still to obtain a Dutch synonym. The average polysemy is 1.5. The resource is currently delivered in XML under the CC BY-SA 4.0 license1 and it has been linked to the Global Wordnet Grid. In order to use the resource, we refer to: https: //github.com/MartenPostma/OpenDutchWordnet.

The Predicate Matrix and the Event and Implied Situation Ontology: Making More of Events
Roxane Segers | Egoitz Laparra | Marco Rospocher | Piek Vossen | German Rigau | Filip Ilievski
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents the Event and Implied Situation Ontology (ESO), a resource which formalizes the pre and post situations of events and the roles of the entities affected by an event. The ontology reuses and maps across existing resources such as WordNet, SUMO, VerbNet, PropBank and FrameNet. We describe how ESO is injected into a new version of the Predicate Matrix and illustrate how these resources are used to detect information in large document collections that otherwise would have remained implicit. The model targets interpretations of situations rather than the semantics of verbs per se. The event is interpreted as a situation using RDF taking all event components into account. Hence, the ontology and the linked resources need to be considered from the perspective of this interpretation model.

Toward a truly multilingual GlobalWordnet Grid
Piek Vossen | Francis Bond | John P. McCrae
Proceedings of the 8th Global WordNet Conference (GWC)

In this paper, we describe a new and improved Global Wordnet Grid that takes advantage of the Collaborative InterLingual Index (CILI). Currently, the Open Multilingal Wordnet has made many wordnets accessible as a single linked wordnet, but as it used the Princeton Wordnet of English (PWN) as a pivot, it loses concepts that are not part of PWN. The technical solution to this, a central registry of concepts, as proposed in the EuroWordnet project through the InterLingual Index, has been known for many years. However, the practical issues of how to host this index and who decides what goes in remained unsolved. Inspired by current practice in the Semantic Web and the Linked Open Data community, we propose a way to solve this issue. In this paper we define the principles and protocols for contributing to the Grid. We tested them on two use cases, adding version 3.1 of the Princeton WordNet to a CILI based on 3.0 and adding the Open Dutch Wordnet, to validate the current set up. This paper aims to be a call for action that we hope will be further discussed and ultimately taken up by the whole wordnet community.

Semantic overfitting: what ‘world’ do we consider when evaluating disambiguation of text?
Filip Ilievski | Marten Postma | Piek Vossen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Semantic text processing faces the challenge of defining the relation between lexical expressions and the world to which they make reference within a period of time. It is unclear whether the current test sets used to evaluate disambiguation tasks are representative for the full complexity considering this time-anchored relation, resulting in semantic overfitting to a specific period and the frequent phenomena within. We conceptualize and formalize a set of metrics which evaluate this complexity of datasets. We provide evidence for their applicability on five different disambiguation tasks. To challenge semantic overfitting of disambiguation systems, we propose a time-based, metric-aware method for developing datasets in a systematic and semi-automated manner, as well as an event-based QA task.

More is not always better: balancing sense distributions for all-words Word Sense Disambiguation
Marten Postma | Ruben Izquierdo Bevia | Piek Vossen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Current Word Sense Disambiguation systems show an extremely poor performance on low frequent senses, which is mainly caused by the difference in sense distributions between training and test data. The main focus in tackling this problem has been on acquiring more data or selecting a single predominant sense and not necessarily on the meta properties of the data itself. We demonstrate that these properties, such as the volume, provenance, and balancing, play an important role with respect to system performance. In this paper, we describe a set of experiments to analyze these meta properties in the framework of a state-of-the-art WSD system when evaluated on the SemEval-2013 English all-words dataset. We show that volume and provenance are indeed important, but that approximating the perfect balancing of the selected training data leads to an improvement of 21 points and exceeds state-of-the-art systems by 14 points while using only simple features. We therefore conclude that unsupervised acquisition of training data should be guided by strategies aimed at matching meta properties.

GRaSP: A Multilayered Annotation Scheme for Perspectives
Chantal van Son | Tommaso Caselli | Antske Fokkens | Isa Maks | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a framework and methodology for the annotation of perspectives in text. In the last decade, different aspects of linguistic encoding of perspectives have been targeted as separated phenomena through different annotation initiatives. We propose an annotation scheme that integrates these different phenomena. We use a multilayered annotation approach, splitting the annotation of different aspects of perspectives into small subsequent subtasks in order to reduce the complexity of the task and to better monitor interactions between layers. Currently, we have included four layers of perspective annotation: events, attribution, factuality and opinion. The annotations are integrated in a formal model called GRaSP, which provides the means to represent instances (e.g. events, entities) and propositions in the (real or assumed) world in relation to their mentions in text. Then, the relation between the source and target of a perspective is characterized by means of perspective annotations. This enables us to place alternative perspectives on the same entity, event or proposition next to each other.

The Event and Implied Situation Ontology (ESO): Application and Evaluation
Roxane Segers | Marco Rospocher | Piek Vossen | Egoitz Laparra | German Rigau | Anne-Lyse Minard
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the Event and Implied Situation Ontology (ESO), a manually constructed resource which formalizes the pre and post situations of events and the roles of the entities affected by an event. The ontology is built on top of existing resources such as WordNet, SUMO and FrameNet. The ontology is injected to the Predicate Matrix, a resource that integrates predicate and role information from amongst others FrameNet, VerbNet, PropBank, NomBank and WordNet. We illustrate how these resources are used on large document collections to detect information that otherwise would have remained implicit. The ontology is evaluated on two aspects: recall and precision based on a manually annotated corpus and secondly, on the quality of the knowledge inferred by the situation assertions in the ontology. Evaluation results on the quality of the system show that 50% of the events typed and enriched with ESO assertions are correct.

Addressing the MFS Bias in WSD systems
Marten Postma | Ruben Izquierdo | Eneko Agirre | German Rigau | Piek Vossen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Word Sense Disambiguation (WSD) systems tend to have a strong bias towards assigning the Most Frequent Sense (MFS), which results in high performance on the MFS but in a very low performance on the less frequent senses. We addressed the MFS bias in WSD systems by combining the output from a WSD system with a set of mostly static features to create a MFS classifier to decide when to and not to choose the MFS. The output from this MFS classifier, which is based on the Random Forest algorithm, is then used to modify the output from the original WSD system. We applied our classifier to one of the state-of-the-art supervised WSD systems, i.e. IMS, and to of the best state-of-the-art unsupervised WSD systems, i.e. UKB. Our main finding is that we are able to improve the system output in terms of choosing between the MFS and the less frequent senses. When we apply the MFS classifier to fine-grained WSD, we observe an improvement on the less frequent sense cases, whereas we maintain the overall recall.

Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Erik van der Goot | Piek Vossen | Andres Montoyo
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Unshared Task at the 3rd Workshop on Argument Mining: Perspective Based Local Agreement and Disagreement in Online Debate
Chantal van Son | Tommaso Caselli | Antske Fokkens | Isa Maks | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
Key-Sun Choi | Christina Unger | Piek Vossen | Jin-Dong Kim | Noriko Kando | Axel-Cyrille Ngonga Ngomo
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)
Tommaso Caselli | Ben Miller | Marieke van Erp | Piek Vossen | David Caswell
Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)

The Storyline Annotation and Representation Scheme (StaR): A Proposal
Tommaso Caselli | Piek Vossen
Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)

Moving away from semantic overfitting in disambiguation datasets
Marten Postma | Filip Ilievski | Piek Vossen | Marieke van Erp
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

2015

VUA-background : When to Use Background Information to Perform Word Sense Disambiguation
Marten Postma | Ruben Izquierdo | Piek Vossen
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

SPINOZA_VU: An NLP Pipeline for Cross Document TimeLines
Tommaso Caselli | Antske Fokkens | Roser Morante | Piek Vossen
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Translating Granularity of Event Slots into Features for Event Coreference Resolution.
Agata Cybulska | Piek Vossen
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

Semantic Interoperability for Cross-lingual and cross-document Event Detection
Piek Vossen | Egoitz Laparra | German Rigau | Itziar Aldabe
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Erik van der Goot | Piek Vossen | Andres Montoyo
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Proceedings of the First Workshop on Computing News Storylines
Tommaso Caselli | Marieke van Erp | Anne-Lyse Minard | Mark Finlayson | Ben Miller | Jordi Atserias | Alexandra Balahur | Piek Vossen
Proceedings of the First Workshop on Computing News Storylines

Storylines for structuring massive streams of news
Piek Vossen | Tommaso Caselli | Yiota Kontzopoulou
Proceedings of the First Workshop on Computing News Storylines

Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data
Piek Vossen | German Rigau | Petya Osenova | Kiril Simov
Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data

2014

BiographyNet: Methodological Issues when NLP supports historical research
Antske Fokkens | Serge ter Braake | Niels Ockeloen | Piek Vossen | Susan Legêne | Guus Schreiber
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

When NLP is used to support research in the humanities, new methodological issues come into play. NLP methods may introduce a bias in their analysis that can influence the results of the hypothesis a humanities scholar is testing. This paper addresses this issue in the context of BiographyNet a multi-disciplinary project involving NLP, Linked Data and history. We introduce the project to the NLP community. We argue that it is essential for historians to get insight into the provenance of information, including how information was extracted from text by NLP tools.

Hope and Fear: How Opinions Influence Factuality
Chantal van Son | Marieke van Erp | Antske Fokkens | Piek Vossen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Both sentiment and event factuality are fundamental information levels for our understanding of events mentioned in news texts. Most research so far has focused on either modeling opinions or factuality. In this paper, we propose a model that combines the two for the extraction and interpretation of perspectives on events. By doing so, we can explain the way people perceive changes in (their belief of) the world as a function of their fears of changes to the bad or their hopes of changes to the good. This study seeks to examine the effectiveness of this approach by applying factuality annotations, based on FactBank, on top of the MPQA Corpus, a corpus containing news texts annotated for sentiments and other private states. Our findings suggest that this approach can be valuable for the understanding of perspectives, but that there is still some work to do on the refinement of the integration.

NewsReader: recording history from daily news streams
Piek Vossen | German Rigau | Luciano Serafini | Pim Stouten | Francis Irving | Willem Van Hage
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The European project NewsReader develops technology to process daily news streams in 4 languages, extracting what happened, when, where and who was involved. NewsReader does not just read a single newspaper but massive amounts of news coming from thousands of sources. It compares the results across sources to complement information and determine where they disagree. Furthermore, it merges news of today with previous news, creating a long-term history rather than separate events. The result is stored in a KnowledgeStore, that cumulates information over time, producing an extremely large knowledge graph that is visualized using new techniques to provide more comprehensive access. We present the first version of the system and the results of processing first batches of data.

Discovering and Visualising Stories in News
Marieke van Erp | Gleb Satyukov | Piek Vossen | Marit Nijsen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Daily news streams often revolve around topics that span over a longer period of time such as the global financial crisis or the healthcare debate in the US. The length and depth of these stories can be such that they become difficult to track for information specialists who need to reconstruct exactly what happened for policy makers and companies. We present a framework to model stories from news: we describe the characteristics that make up interesting stories, how these translate to filters on our data and we present a first use case in which we detail the steps to visualising story lines extracted from news articles about the global automotive industry.

Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution
Agata Cybulska | Piek Vossen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we examine the representativeness of the EventCorefBank (ECB, Bejan and Harabagiu, 2010) with regards to the language population of large-volume streams of news. The ECB corpus is one of the data sets used for evaluation of the task of event coreference resolution. Our analysis shows that the ECB in most cases covers one seminal event per domain, what considerably simplifies event and so language diversity that one comes across in the news. We augmented the corpus with a new corpus component, consisting of 502 texts, describing different instances of event types that were already captured by the 43 topics of the ECB, making it more representative of news articles on the web. The new “ECB+” corpus is available for further research.

Generating Polarity Lexicons with WordNet propagation in 5 languages
Isa Maks | Ruben Izquierdo | Francesca Frontini | Rodrigo Agerri | Piek Vossen | Andoni Azpeitia
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we focus on the creation of general-purpose (as opposed to domain-specific) polarity lexicons in five languages: French, Italian, Dutch, English and Spanish using WordNet propagation. WordNet propagation is a commonly used method to generate these lexicons as it gives high coverage of general purpose language and the semantically rich WordNets where concepts are organised in synonym , antonym and hyperonym/hyponym structures seem to be well suited to the identification of positive and negative words. However, WordNets of different languages may vary in many ways such as the way they are compiled, the number of synsets, number of synonyms and number of semantic relations they include. In this study we investigate whether this variability translates into differences of performance when these WordNets are used for polarity propagation. Although many variants of the propagation method are developed for English, little is known about how they perform with WordNets of other languages. We implemented a propagation algorithm and designed a method to obtain seed lists similar with respect to quality and size, for each of the five languages. We evaluated the results against gold standards also developed according to a common method in order to achieve as less variance as possible between the different languages.

Proceedings of the Seventh Global Wordnet Conference
Heili Orav | Christiane Fellbaum | Piek Vossen
Proceedings of the Seventh Global Wordnet Conference

What implementation and translation teach us: the case of semantic similarity measures in wordnets
Marten Postma | Piek Vossen
Proceedings of the Seventh Global Wordnet Conference

2013

Offspring from Reproduction Problems: What Replication Failure Teaches Us
Antske Fokkens | Marieke van Erp | Marten Postma | Ted Pedersen | Piek Vossen | Nuno Freire
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Semantic Relations between Events and their Time, Locations and Participants for Event Coreference Resolution
Agata Cybulska | Piek Vossen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?
Isa Maks | Piek Vossen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

DutchSemCor: in quest of the ideal sense-tagged corpus
Piek Vossen | Rubén Izquierdo | Attila Görög
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

GAF: A Grounded Annotation Framework for Events
Antske Fokkens | Marieke van Erp | Piek Vossen | Sara Tonelli | Willem Robert van Hage | Luciano Serafini | Rachele Sprugnoli | Jesper Hoeksema
Workshop on Events: Definition, Detection, Coreference, and Representation

2012

DutchSemCor: Targeting the ideal sense-tagged corpus
Piek Vossen | Attila Görög | Rubén Izquierdo | Antal van den Bosch
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. In this paper, we discuss the different conflicting requirements for a sense-tagged corpus and our strategies to fulfill them. We report on a first series of experiments to sup- port our semi-automatic approach to build the corpus.

Mapping WordNet to the Kyoto ontology
Egoitz Laparra | German Rigau | Piek Vossen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the connection of WordNet to a generic ontology based on DOLCE. We developed a complete set of heuristics for mapping all WordNet nouns, verbs and adjectives to the ontology. Moreover, the mapping also allows to represent predicates in a uniform and interoperable way, regardless of the way they are expressed in the text and in which language. Together with the ontology, the WordNet mappings provide a extremely rich and powerful basis for semantic processing of text in any domain. In particular, the mapping has been used in a knowledge-rich event-mining system developed for the Asian-European project KYOTO.

Building a fine-grained subjectivity lexicon from a web corpus
Isa Maks | Piek Vossen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we propose a method to build fine-grained subjectivity lexicons including nouns, verbs and adjectives. The method, which is applied for Dutch, is based on the comparison of word frequencies of three corpora: Wikipedia, News and News comments. Comparison of the corpora is carried out with two measures: log-likelihood ratio and a percentage difference calculation. The first step of the method involves subjectivity identification, i.e. determining if a word is subjective or not. The second step aims at the identification of more fine-grained subjectivity which is the distinction between actor subjectivity and speaker / writer subjectivity. The results suggest that this approach can be usefully applied producing subjectivity lexicons of high quality.

2011

Historical Event Extraction from Text
Agata Katarzyna Cybulska | Piek Vossen
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

A verb lexicon model for deep sentiment analysis and opinion mining applications
Isa Maks | Piek Vossen
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

2010

Annotation Scheme and Gold Standard for Dutch Subjective Adjectives
Isa Maks | Piek Vossen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Many techniques are developed to derive automatically lexical resources for opinion mining. In this paper we present a gold standard for Dutch adjectives developed for the evaluation of these techniques. In the first part of the paper we introduce our annotation guidelines. They are based upon guidelines recently developed for English which annotate subjectivity and polarity at word sense level. In addition to subjectivity and polarity we propose a third annotation category: that of the attitude holder. The identity of the attitude holder is partly implied by the word itself and may provide useful information for opinion mining systems. In the second part of paper we present the criteria adopted for the selection of items which should be included in this gold standard. Our design is aimed at an equal representation of all dimensions of the lexicon , like frequency and polysemy, in order to create a gold standard which can be used not only for benchmarking purposes but also may help to improve in a systematic way, the methods which derive the word lists. Finally we present the results of the annotation task including annotator agreement rates and disagreement analysis.

Event Models for Historical Perspectives: Determining Relations between High and Low Level Events in Text, Based on the Classification of Time, Location and Participants.
Agata Cybulska | Piek Vossen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we report on a study that was performed within the Semantics of History project on how descriptions of historical events are realized in different types of text and what the implications are for modeling the event information. We believe that different historical perspectives of writers correspond in some degree with genre distinction and correlate with variation in language use. To capture differences between event representations in diverse text types and thus to identify relations between historical events, we defined an event model. We observed clear relations between particular parts of event descriptions - actors, time and location modifiers. Texts, written shortly after an event happened, use more specific and uniquely occurring event descriptions than texts describing the same events but written from a longer time perspective. We carried out some statistical corpus research to confirm this hypothesis. The ability to automatically determine relations between historical events and their sub-events over textual data, based on the relations between event participants, time markers and locations, will have important repercussions for the design of historical information retrieval systems.

Computer Assisted Semantic Annotation in the DutchSemCor Project
Attila Görög | Piek Vossen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The goal of this paper is to describe the annotation protocols and the Semantic Annotation Tool (SAT) used in the DutchSemCor project. The DutchSemCor project is aiming at aligning the Cornetto lexical database with the Dutch language corpus SoNaR. 250K corpus occurrences of the 3,000 most frequent and most ambiguous Dutch nouns, adjectives and verbs are being annotated manually using the SAT. This data is then used for bootstrapping 750K extra occurrences which in turn will be checked manually. Our main focus in this paper is the methodology applied in the project to attain the envisaged Inter-annotator Agreement (IA) of =80%. We will also discuss one of the main objectives of DutchSemCor i.e. to provide semantically annotated language data with high scores for quantity, quality and diversity. Sample data with high scores for these three features can yield better results for co-training WSD systems. Finally, we will take a brief look at our annotation tool.

Facilitating Non-expert Users of the KYOTO Platform: the TMEKO Editing Protocol for Synset to Ontology Mappings
Roxane Segers | Piek Vossen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the general architecture of the TMEKO protocol (Tutoring Methodology for Enriching the Kyoto Ontology) that guides non-expert users through the process of creating mappings from domain wordnet synsets to a shared ontology by answering natural language questions. TMEKO will be part of a Wiki-like community platform currently developed in the Kyoto project (http://www.kyoto-project.eu). The platform provides the architecture for ontology based fact mining to enable knowledge sharing across languages and cultures. A central part of the platform is the Wikyoto editing environment in which users can create their own domain wordnet for seven different languages and define relations to the central and shared ontology based on DOLCE. A substantial part of the mappings will involve important processes and qualities associated with the concept. Therefore, the TMEKO protocol provides specific interviews for creating complex mappings that go beyond subclass and equivalence relations. The Kyoto platform and the TMEKO protocol are developed and applied to the environment domain for seven different languages (English, Dutch, Italian, Spanish, Basque, Japanese and Chinese), but can easily be extended and adapted to other languages and domains.

Integrating a Large Domain Ontology of Species into WordNet
Montse Cuadros | Egoitz Laparra | German Rigau | Piek Vossen | Wauter Bosma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

With the proliferation of applications sharing information represented in multiple ontologies, the development of automatic methods for robust and accurate ontology matching will be crucial to their success. Connecting and merging already existing semantic networks is perhaps one of the most challenging task related to knowledge engineering. This paper presents a new approach for aligning automatically a very large domain ontology of Species to WordNet in the framework of the KYOTO project. The approach relies on the use of knowledge-based Word Sense Disambiguation algorithm which accurately assigns WordNet synsets to the concepts represented in Species 2000.

Bootstrapping Language Neutral Term Extraction
Wauter Bosma | Piek Vossen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A variety of methods exist for extracting terms and relations between terms from a corpus, each of them having strengths and weaknesses. Rather than just using the joint results, we apply different extraction methods in a way that the results of one method are input to another. This gives us the leverage to find terms and relations that otherwise would not be found. Our goal is to create a semantic model of a domain. To that end, we aim to find the complete terminology of the domain, consisting of terms and relations such as hyponymy and meronymy, and connected to generic wordnets and ontologies. Terms are ranked by domain-relevance only as a final step, after terminology extraction is completed. Because term relations are a large part of the semantics of a term, we estimate the relevance from its relation to other terms, in addition to occurrence and document frequencies. In the KYOTO project, we apply language-neutral terminology extraction from a parsed corpus for seven languages.

SemEval-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain
Eneko Agirre | Oier Lopez de Lacalle | Christiane Fellbaum | Shu-Kai Hsieh | Maurizio Tesconi | Monica Monachini | Piek Vossen | Roxanne Segers
Proceedings of the 5th International Workshop on Semantic Evaluation

Kyoto: An Integrated System for Specific Domain WSD
Aitor Soroa | Eneko Agirre | Oier Lopez de Lacalle | Wauter Bosma | Piek Vossen | Monica Monachini | Jessie Lo | Shu-Kai Hsieh
Proceedings of the 5th International Workshop on Semantic Evaluation

Proceedings of the 6th Workshop on Ontologies and Lexical Resources
Alessandro Oltramari | Piek Vossen | Qin Lu
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

KYOTO: an open platform for mining facts
Piek Vossen | German Rigau | Eneko Agirre | Aitor Soroa | Monica Monachini | Roberto Bartolini
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

2009

SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Specific Domain
Eneko Agirre | Oier Lopez de Lacalle | Christiane Fellbaum | Andrea Marchetti | Antonio Toral | Piek Vossen
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

2008

Integrating Lexical Units, Synsets and Ontology in the Cornetto Database
Piek Vossen | Isa Maks | Roxane Segers | Hennie VanderVliet
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Cornetto is a two-year Stevin project (project number STE05039) in which a lexical semantic database is built that combines Wordnet with Framenet-like information for Dutch. The combination of the two lexical resources (the Dutch Wordnet and the Referentie Bestand Nederlands) will result in a much richer relational database that may improve natural language processing (NLP) technologies, such as word sense-disambiguation, and language-generation systems. In addition to merging the Dutch lexicons, the database is also mapped to a formal ontology to provide a more solid semantic backbone. Since the database represents different traditions and perspectives of semantic organization, a key issue in the project is the alignment of concepts across the resources. This paper discusses our methodology to first automatically align the word meanings and secondly to manually revise the most critical cases.

Adjectives in the Dutch Semantic Lexical Database CORNETTO
Isa Maks | Piek Vossen | Roxane Segers | Hennie van der Vliet
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The goal of this paper is to describe how adjectives are encoded in Cornetto, a semantic lexical database for Dutch. Cornetto combines two existing lexical resources with different semantic organisation, i.e. Dutch Wordnet (DWN) with a synset organisation and Referentie Bestand Nederlands (RBN) with an organisation in Lexical Units. Both resources will be aligned and mapped on the formal ontology SUMO. In this paper, we will first present details of the description of adjectives in each of the the two resources. We will then address the problems that are encountered during alignment to the SUMO ontology which are greatly due to the fact that SUMO has never been tested for its adequacy with respect to adjectives. We contrasted SUMO with an existing semantic classification which resulted in a further refined and extended SUMO geared for the description of adjectives.

We outline work performed within the framework of a current EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology/biodiversity) anchored in a language-independent ontology that is linked to wordnets in seven languages. For each language, information extraction and identification of lexicalized concepts with ontological entries is carried out by text miners (Kybots). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project enables long-range knowledge sharing and transfer across many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the specific domains addressed here.

2007

SemEval-2007 Task 01: Evaluating WSD on Cross-Language Information Retrieval
Eneko Agirre | Bernardo Magnini | Oier Lopez de Lacalle | Arantxa Otegi | German Rigau | Piek Vossen
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

Arabic WordNet and the Challenges of Arabic
Sabri Elkateb | William Black | Piek Vossen | David Farwell | Horacio Rodríguez | Adam Pease | Musa Alkhalifa | Christiane Fellbaum
Proceedings of the International Conference on the Challenge of Arabic for NLP/MT

Arabic WordNet is a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Arabic WordNet (AWN) is based on the design and contents of the universally accepted Princeton WordNet (PWN) and will be mappable straightforwardly onto PWN 2.0 and EuroWordNet (EWN), enabling translation on the lexical level to English and dozens of other languages. We have developed and linked the AWN with the Suggested Upper Merged Ontology (SUMO), where concepts are defined with machine interpretable semantics in first order logic (Niles and Pease, 2001). We have greatly extended the ontology and its set of mappings to provide formal terms and definitions for each synset. The end product would be a linguistic resource with a deep formal semantic foundation that is able to capture the richness of Arabic as described in Elkateb (2005). Tools we have developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004). In this paper we describe our methodology for building a lexical resource in Arabic and the challenge of Arabic for lexical resources.

Building a WordNet for Arabic
Sabri Elkateb | William Black | Horacio Rodríguez | Musa Alkhalifa | Piek Vossen | Adam Pease | Christiane Fellbaum
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper introduces a recently initiated project that focuses on building a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Our aim is to develop a linguistic resource with a deep formal semantic foundation in order to capture the richness of Arabic as described in Elkateb (2005). Arabic WordNet is being constructed following methods developed for EuroWordNet (Vossen, 1998). In addition to the standard wordnet representation of senses, word meanings are also being defined with a machine understandable semantics in first order logic. The basis for this semantics is the Suggested Upper Merged Ontology and its associated domain ontologies (Niles and Pease, 2001). We will greatly extend the ontology and its set of mappings to provide formal terms and definitions for each synset. Tools to be developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004).

2002

MEANING: a Roadmap to Knowledge Technologies
German Rigau | Bernardo Magnini | Eneko Agirre | Piek Vossen | John Carroll
COLING-02: A Roadmap for Computational Linguistics

1999

Towards a Universal Index of Meaning
Piek Vossen | Wim Peters | Julio Gonzalo
SIGLEX99: Standardizing Lexical Resources

1997

Multilingual design of EuroWordNet
Piek Vossen | Pedro Diez-Orzas | Wim Peters
Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications

1991

In So Many Words Knowledge as a Lexical Phenomenon
Willem Meijs | Piek Vossen
Lexical Semantics and Knowledge Representation

Co-authors

Christiane Fellbaum 11

Roxane Segers 10

Filip Ilievski 9

Marieke van Erp 9

Rubén Izquierdo 6

Roser Morante 6

Pia Sommerauer 6

Chantal van Son 6

Agata Cybulska 5

Egoitz Laparra 5

Levi Remijnse 5

Oier Lopez de Lacalle 4

Monica Monachini 4

Wondimagegnhue Tufa 4

Emiel Van Miltenburg 4

Stella Verkijk 4

Willem Robert van Hage 4

Selene Baez Santamaria 3

Alexandra Balahur 3

David Caswell 3

Desmond Elliott 3

Attila Görög 3

Shu-Kai Hsieh 3

Marco Rospocher 3

Musa Alkhalifa 2

Lisa Beinborn 2

William J. Black 2

Nicoletta Calzolari 2

Sabri Elkateb 2

Catholijn M. Jonker 2

Bernardo Magnini 2

Andrea Marchetti 2

John Philip McCrae 2

Anne-Lyse Minard 2

Teruko Mitamura 2

Andrés Montoyo 2

Pradeep K. Murukannaiah 2

Martha Palmer 2

Horacio Rodríguez 2

Stefan Schouten 2

Luciano Serafini 2

Maurizio Tesconi 2

Sam Titarsolej 2

Michiel Van Der Meer 2

Erik van der Goot 2

Hennie van der Vliet 2

Rodrigo Agerri 1

Itziar Aldabe 1

Sophie I. Arnoult 1

Jordi Atserias 1

Andoni Azpeitia 1

Baran Barbarestani 1

Verginica Barbu Mititelu 1

Roberto Bartolini 1

Claire Bonial 1

António Branco 1

Susan Windisch Brown 1

John A. Carroll 1

Guillem Collell 1

Montse Cuadros 1

Pedro Diez-Orzas 1

Willem Elbers 1

David Farwell 1

Mark Finlayson 1

Adrian Flanagan 1

Corina Forăscu 1

Francesca Frontini 1

Edwin Geleijn 1

Julio Gonzalo 1

Jesper Hoeksema 1

Rinke Hoekstra 1

Chu-Ren Huang 1

Francis Irving 1

Hitoshi Isahara 1

Kyoko Kanzaki 1

Yiota Kontzopoulou 1

Inger Leemans 1

Susan Legêne 1

Carel Meskers 1

Caroline Meskers 1

Gosse Minnema 1

André Moreira 1

Federico Neri 1

Axel-Cyrille Ngonga Ngomo 1

Malvina Nissim 1

Niels Ockeloen 1

Alessandro Oltramari 1

Petya Osenova 1

Arantxa Otegi 1

Lodewijk Petram 1

Marten Postrma 1

Remo Raffaelli 1

Gleb Satyukov 1

Stefan Schlobach 1

Anneleen Schoen 1

Guus Schreiber 1

Rachele Sprugnoli 1

Kuan Eeik Tan 1

Antonio Toral 1

Christina Unger 1

Jacopo Urbani 1

Guy Widdershoven 1

Serge ter Braake 1

Maarten van Meersbergen 1

Gertjan van Noord 1

Dieter van Uytvanck 1

Antal van den Bosch 1

Marieke van der Leeden 1

Sabina van der Veen 1

Janneke van der Zwaan 1

Venues