Leo Wanner

2025

pdf bib abs

What the #?*!: Disentangling Hate Across Target Identities
Yiping Jin | Leo Wanner | Aneesh Moideen Koya
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Hate speech (HS) classifiers do not perform equally well in detecting hateful expressions towards different target identities. They also demonstrate systematic biases in predicted hatefulness scores. Tapping on two recently proposed functionality test datasets for HS detection, we quantitatively analyze the impact of different factors on HS prediction. Experiments on popular industrial and academic models demonstrate that HS detectors assign a higher hatefulness score merely based on the mention of specific target identities. Besides, models often confuse hatefulness and the polarity of emotions. This result is worrisome as the effort to build HS detectors might harm the vulnerable identity groups we wish to protect: posts expressing anger or disapproval of hate expressions might be flagged as hateful themselves. We also carry out a study inspired by social psychology theory, which reveals that the accuracy of hatefulness prediction correlates strongly with the intensity of the stereotype.

pdf bib

Proceedings of the 31st International Conference on Computational Linguistics
Owen Rambow | Leo Wanner | Marianna Apidianaki | Hend Al-Khalifa | Barbara Di Eugenio | Steven Schockaert
Proceedings of the 31st International Conference on Computational Linguistics

pdf bib abs

Exploring morphology-aware tokenization: A case study on Spanish language modeling
Alba Táboas García | Piotr Przybyła | Leo Wanner
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

This paper investigates to what extent the integration of morphological information can improve subword tokenization and thus also language modeling performance. We focus on Spanish, a language with fusional morphology, where subword segmentation can benefit from linguistic structure. Instead of relying on purely data-driven strategies like Byte Pair Encoding (BPE), we explore a linguistically grounded approach: training a tokenizer on morphologically segmented data. To do so, we develop a semi-supervised segmentation model for Spanish, building gold-standard datasets to guide and evaluate it. We then use this tokenizer to pre-train a masked language model and assess its performance on several downstream tasks. Our results show improvements over a baseline with a standard tokenizer, supporting our hypothesis that morphology-aware tokenization offers a viable and principled alternative for improving language modeling.

pdf bib abs

Multilingual Skill Extraction for Job Vacancy–Job Seeker Matching in Knowledge Graphs
Hamit Kavas | Marc Serra-Vidal | Leo Wanner
Proceedings of the Workshop on Generative AI and Knowledge Graphs (GenAIK)

In the modern labor market, accurate matching of job vacancies with suitable candidate CVs is critical. We present a novel multilingual knowledge graph-based framework designed to enhance the matching by accurately extracting the skills requested by a job and provided by a job seeker in a multilingual setting and aligning them via the standardized skill labels of the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy. The proposed framework employs a combination of state-of-the-art techniques to extract relevant skills from job postings and candidate experiences. These extracted skills are then filtered and mapped to the ESCO taxonomy and integrated into a multilingual knowledge graph that incorporates hierarchical relationships and cross-linguistic variations through embeddings. Our experiments demonstrate a significant improvement of the matching quality compared to the state of the art.

pdf bib

pdf bib abs

Assessing the Agreement Competence of Large Language Models
Alba Táboas García | Leo Wanner
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

While the competence of LLMs to cope with agreement constraints has been widely tested in English, only a very limited number of works deals with morphologically rich(er) languages. In this work, we experiment with 25 mono- and multilingual LLMs, applying them to a collection of more than 5,000 test examples that cover the main agreement phenomena in three Romance languages (Italian, Portuguese, and Spanish) and one Slavic Language (Russian). We identify which of the agreement phenomena are most difficult for which models and challenge some common assumptions of what makes a good model. The test suites into which the test examples are organized are openly available and can be easily adapted to other agreement phenomena and other languages for further research.

2024

pdf bib abs

GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Yiping Jin | Leo Wanner | Alexander Shvets
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Online hate detection suffers from biases incurred in data sampling, annotation, and model pre-training. Therefore, measuring the averaged performance over all examples in held-out test data is inadequate. Instead, we must identify specific model weaknesses and be informed when it is more likely to fail. A recent proposal in this direction is HateCheck, a suite for testing fine-grained model functionalities on synthesized data generated using templates of the kind “You are just a [slur] to me.” However, despite enabling more detailed diagnostic insights, the HateCheck test cases are often generic and have simplistic sentence structures that do not match the real-world data. To address this limitation, we propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch by instructing large language models (LLMs). We employ an additional natural language inference (NLI) model to verify the generations. Crowd-sourced annotation demonstrates that the generated test cases are of high quality. Using the new functional tests, we can uncover model weaknesses that would be overlooked using the original HateCheck dataset.

pdf bib abs

Enhancing Job Posting Classification with Multilingual Embeddings and Large Language Models
Hamit Kavas | Marc Serra-Vidal | Leo Wanner
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

In the modern labour market, taxonomies such the European Skills, Competences, Qualifications and Occupations (ESCO) classification are used as an interlingua to match job postings with job seeker profiles. Both are classified with respect to ESCO occupations, and match if they align with the same occupation and the same skills assigned to the occupation. However, matching models usually struggle with the classification because of overlapping skills and similar definitions of occupations defined in the ESCO taxonomy. This often leads to imprecise classification outcomes. In this paper, we focus on the challenge of the classification of job postings written in Italian or Spanish against ESCO occupations written in English. We experiment with multilingual embeddings, zero-shot classification, and use of a large language model (LLM) and show that the use of an LLM leads to best results.

2023

pdf bib abs

Towards Weakly-Supervised Hate Speech Classification Across Datasets
Yiping Jin | Leo Wanner | Vishakha Kadam | Alexander Shvets
The 7th Workshop on Online Abuse and Harms (WOAH)

As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.

pdf bib abs

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.

2022

pdf bib abs

Directions for NLP Practices Applied to Online Hate Speech Detection
Paula Fortuna | Monica Dominguez | Leo Wanner | Zeerak Talat
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Addressing hate speech in online spaces has been conceptualized as a classification task that uses Natural Language Processing (NLP) techniques. Through this conceptualization, the hate speech detection task has relied on common conventions and practices from NLP. For instance, inter-annotator agreement is conceptualized as a way to measure dataset quality and certain metrics and benchmarks are used to assure model generalization. However, hate speech is a deeply complex and situated concept that eludes such static and disembodied practices. In this position paper, we critically reflect on these methodologies for hate speech detection, we argue that many conventions in NLP are poorly suited for the problem and encourage researchers to develop methods that are more appropriate for the task.

pdf bib abs

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers
Luis Espinosa Anke | Alexander Shvets | Alireza Mohammadshahi | James Henderson | Leo Wanner
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.

2021

pdf bib abs

Cartography of Natural Language Processing for Social Good (NLP4SG): Searching for Definitions, Statistics and White Spots
Paula Fortuna | Laura Pérez-Mayos | Ahmed AbuRa’ed | Juan Soler-Company | Leo Wanner
Proceedings of the 1st Workshop on NLP for Positive Impact

The range of works that can be considered as developing NLP for social good (NLP4SG) is enormous. While many of them target the identification of hate speech or fake news, there are others that address, e.g., text simplification to alleviate consequences of dyslexia, or coaching strategies to fight depression. However, so far, there is no clear picture of what areas are targeted by NLP4SG, who are the actors, which are the main scenarios and what are the topics that have been left aside. In order to obtain a clearer view in this respect, we first propose a working definition of NLP4SG and identify some primary aspects that are crucial for NLP4SG, including, e.g., areas, ethics, privacy and bias. Then, we draw upon a corpus of around 50,000 articles downloaded from the ACL Anthology. Based on a list of keywords retrieved from the literature and revised in view of the task, we select from this corpus articles that can be considered to be on NLP4SG according to our definition and analyze them in terms of trends along the time line, etc. The result is a map of the current NLP4SG research and insights concerning the white spots on this map.

pdf bib abs

Targets and Aspects in Social Media Hate Speech
Alexander Shvets | Paula Fortuna | Juan Soler | Leo Wanner
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

Mainstream research on hate speech focused so far predominantly on the task of classifying mainly social media posts with respect to predefined typologies of rather coarse-grained hate speech categories. This may be sufficient if the goal is to detect and delete abusive language posts. However, removal is not always possible due to the legislation of a country. Also, there is evidence that hate speech cannot be successfully combated by merely removing hate speech posts; they should be countered by education and counter-narratives. For this purpose, we need to identify (i) who is the target in a given hate speech post, and (ii) what aspects (or characteristics) of the target are attributed to the target in the post. As the first approximation, we propose to adapt a generic state-of-the-art concept extraction model to the hate speech domain. The outcome of the experiments is promising and can serve as inspiration for further work on the task

pdf bib

Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
Laura Pérez-Mayos | Alba Táboas García | Simon Mille | Leo Wanner
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs

How much pretraining data do language models need to learn syntax?
Laura Pérez-Mayos | Miguel Ballesteros | Leo Wanner
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.

pdf bib abs

Evaluating language models for the retrieval and categorization of lexical collocations
Luis Espinosa Anke | Joan Codina-Filba | Leo Wanner
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Lexical collocations are idiosyncratic combinations of two syntactically bound lexical items (e.g., “heavy rain” or “take a step”). Understanding their degree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for language learners, lexicographers and downstream NLP applications. In this paper, we perform an exhaustive analysis of current language models for collocation understanding. We first construct a dataset of apparitions of lexical collocations in context, categorized into 17 representative semantic categories. Then, we perform two experiments: (1) unsupervised collocate retrieval using BERT, and (2) supervised collocation classification in context. We find that most models perform well in distinguishing light verb constructions, especially if the collocation’s first argument acts as subject, but often fail to distinguish, first, different syntactic structures within the same semantic category, and second, fine-grained semantic categories which restrict the use of small sets of valid collocates for a given base.

pdf bib abs

On the evolution of syntactic information encoded by BERT’s contextualized representations
Laura Pérez-Mayos | Roberto Carlini | Miguel Ballesteros | Leo Wanner
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becomes increasingly important to understand how the encoded knowledge evolves along the fine-tuning. In this paper, we analyze the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks, covering all levels of the linguistic structure. Experimental results show that the encoded syntactic information is forgotten (PoS tagging), reinforced (dependency and constituency parsing) or preserved (semantics-related tasks) in different ways along the fine-tuning process depending on the task.

2020

pdf bib abs

CollFrEn: Rich Bilingual English–French Collocation Resource
Beatriz Fisas | Luis Espinosa Anke | Joan Codina-Filbá | Leo Wanner
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

Collocations in the sense of idiosyncratic lexical co-occurrences of two syntactically bound words traditionally pose a challenge to language learners and many Natural Language Processing (NLP) applications alike. Reliable ground truth (i.e., ideally manually compiled) resources are thus of high value. We present a manually compiled bilingual English–French collocation resource with 7,480 collocations in English and 6,733 in French. Each collocation is enriched with information that facilitates its downstream exploitation in NLP tasks such as machine translation, word sense disambiguation, natural language generation, relation classification, and so forth. Our proposed enrichment covers: the semantic category of the collocation (its lexical function), its vector space representation (for each individual word as well as their joint collocation embedding), a subcategorization pattern of both its elements, as well as their corresponding BabelNet id, and finally, indices of their occurrences in large scale reference corpora.

pdf bib abs

Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets
Paula Fortuna | Juan Soler | Leo Wanner
Proceedings of the Twelfth Language Resources and Evaluation Conference

The field of the automatic detection of hate speech and related concepts has raised a lot of interest in the last years. Different datasets were annotated and classified by means of applying different machine learning algorithms. However, few efforts were done in order to clarify the applied categories and homogenize different datasets. Our study takes up this demand. We analyze six different publicly available datasets in this field with respect to their similarity and compatibility. We conduct two different experiments. First, we try to make the datasets compatible and represent the dataset classes as Fast Text word vectors analyzing the similarity between different classes in a intra and inter dataset manner. Second, we submit the chosen datasets to the Perspective API Toxicity classifier, achieving different performances depending on the categories and datasets. One of the main conclusions of these experiments is that many different definitions are being used for equivalent concepts, which makes most of the publicly available datasets incompatible. Grounded in our analysis, we provide guidelines for future dataset collection and annotation.

pdf bib abs

The Third Multilingual Surface Realisation Shared Task (SR’20): Overview and Evaluation Results
Simon Mille | Anya Belz | Bernd Bohnet | Thiago Castro Ferreira | Yvette Graham | Leo Wanner
Proceedings of the Third Workshop on Multilingual Surface Realisation

This paper presents results from the Third Shared Task on Multilingual Surface Realisation (SR’20) which was organised as part of the COLING’20 Workshop on Multilingual Surface Realisation. As in SR’18 and SR’19, the shared task comprised two tracks: (1) a Shallow Track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (2) a Deep Track where additionally, functional words and morphological information were removed. Moreover, each track had two subtracks: (a) restricted-resource, where only the data provided or approved as part of a track could be used for training models, and (b) open-resource, where any data could be used. The Shallow Track was offered in 11 languages, whereas the Deep Track in 3 ones. Systems were evaluated using both automatic metrics and direct assessment by human evaluators in terms of Readability and Meaning Similarity to reference outputs. We present the evaluation results, along with descriptions of the SR’19 tracks, data and evaluation methods, as well as brief summaries of the participating systems. For full descriptions of the participating systems, please see the separate system reports elsewhere in this volume.

pdf bib abs

In this paper, we present a pipeline system that generates architectural landmark descriptions using textual, visual and structured data. The pipeline comprises five main components:(i) a textual analysis component, which extracts information from Wikipedia pages; (ii)a visual analysis component, which extracts information from copyright-free images; (iii) a retrieval component, which gathers relevant (property, subject, object) triples from DBpedia; (iv) a fusion component, which stores the contents from the different modalities in a Knowledge Base (KB) and resolves the conflicts that stem from using different sources of information; (v) an NLG component, which verbalises the resulting contents of the KB. We show that thanks to the addition of other modalities, we can make the verbalisation of DBpedia triples more relevant and/or inspirational.

pdf bib

pdf bib abs

ThemePro: A Toolkit for the Analysis of Thematic Progression
Monica Dominguez | Juan Soler | Leo Wanner
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces ThemePro, a toolkit for the automatic analysis of thematic progression. Thematic progression is relevant to natural language processing (NLP) applications dealing, among others, with discourse structure, argumentation structure, natural language generation, summarization and topic detection. A web platform demonstrates the potential of this toolkit and provides a visualization of the results including syntactic trees, hierarchical thematicity over propositions and thematic progression over whole texts.

2019

pdf bib abs

A Hierarchically-Labeled Portuguese Hate Speech Dataset
Paula Fortuna | João Rocha da Silva | Juan Soler-Company | Leo Wanner | Sérgio Nunes
Proceedings of the Third Workshop on Abusive Language Online

Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning are applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. Firstly, non-experts annotated the tweets with binary labels (‘hate’ vs. ‘no-hate’). Secondly, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. This hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.

pdf bib abs

The Second Multilingual Surface Realisation Shared Task (SR’19): Overview and Evaluation Results
Simon Mille | Anja Belz | Bernd Bohnet | Yvette Graham | Leo Wanner
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We report results from the SR’19 Shared Task, the second edition of a multilingual surface realisation task organised as part of the EMNLP’19 Workshop on Multilingual Surface Realisation. As in SR’18, the shared task comprised two tracks with different levels of complexity: (a) a shallow track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (b) a deep track where additionally, functional words and morphological information were removed. The shallow track was offered in eleven, and the deep track in three languages. Systems were evaluated (a) automatically, using a range of intrinsic metrics, and (b) by human judges in terms of readability and meaning similarity. This report presents the evaluation results, along with descriptions of the SR’19 tracks, data and evaluation methods. For full descriptions of the participating systems, please see the separate system reports elsewhere in this volume.

pdf bib abs

Teaching FORGe to Verbalize DBpedia Properties in Spanish
Simon Mille | Stamatia Dasiopoulou | Beatriz Fisas | Leo Wanner
Proceedings of the 12th International Conference on Natural Language Generation

Statistical generators increasingly dominate the research in NLG. However, grammar-based generators that are grounded in a solid linguistic framework remain very competitive, especially for generation from deep knowledge structures. Furthermore, if built modularly, they can be ported to other genres and languages with a limited amount of work, without the need of the annotation of a considerable amount of training data. One of these generators is FORGe, which is based on the Meaning-Text Model. In the recent WebNLG challenge (the first comprehensive task addressing the mapping of RDF triples to text) FORGe ranked first with respect to the overall quality in human evaluation. We extend the coverage of FORGE’s open source grammatical and lexical resources for English, so as to further improve the English texts, and port them to Spanish, to achieve a comparable quality. This confirms that, as already observed in the case of SimpleNLG, a robust universal grammar-driven framework and a systematic organization of the linguistic resources can be an adequate choice for NLG applications.

pdf bib abs

Collocation Classification with Unsupervised Relation Vectors
Luis Espinosa Anke | Steven Schockaert | Leo Wanner
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Lexical relation classification is the task of predicting whether a certain relation holds between a given pair of words. In this paper, we explore to which extent the current distributional landscape based on word embeddings provides a suitable basis for classification of collocations, i.e., pairs of words between which idiosyncratic lexical relations hold. First, we introduce a novel dataset with collocations categorized according to lexical functions. Second, we conduct experiments on a subset of this benchmark, comparing it in particular to the well known DiffVec dataset. In these experiments, in addition to simple word vector arithmetic operations, we also investigate the role of unsupervised relation vectors as a complementary input. While these relation vectors indeed help, we also show that lexical function classification poses a greater challenge than the syntactic and semantic relations that are typically used for benchmarks in the literature.

pdf bib

Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)
Simon Mille | Anja Belz | Bernd Bohnet | Yvette Graham | Leo Wanner
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

2018

pdf bib abs

Underspecified Universal Dependency Structures as Inputs for Multilingual Surface Realisation
Simon Mille | Anja Belz | Bernd Bohnet | Leo Wanner
Proceedings of the 11th International Conference on Natural Language Generation

In this paper, we present the datasets used in the Shallow and Deep Tracks of the First Multilingual Surface Realisation Shared Task (SR’18). For the Shallow Track, data in ten languages has been released: Arabic, Czech, Dutch, English, Finnish, French, Italian, Portuguese, Russian and Spanish. For the Deep Track, data in three languages is made available: English, French and Spanish. We describe in detail how the datasets were derived from the Universal Dependencies V2.0, and report on an evaluation of the Deep Track input quality. In addition, we examine the motivation for, and likely usefulness of, deriving NLG inputs from annotations in resources originally developed for Natural Language Understanding (NLU), and assess whether the resulting inputs supply enough information of the right kind for the final stage in the NLG process.

pdf bib abs

We report results from the SR’18 Shared Task, a new multilingual surface realisation task organised as part of the ACL’18 Workshop on Multilingual Surface Realisation. As in its English-only predecessor task SR’11, the shared task comprised two tracks with different levels of complexity: (a) a shallow track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (b) a deep track where additionally, functional words and morphological information were removed. The shallow track was offered in ten, and the deep track in three languages. Systems were evaluated (a) automatically, using a range of intrinsic metrics, and (b) by human judges in terms of readability and meaning similarity. This report presents the evaluation results, along with descriptions of the SR’18 tracks, data and evaluation methods. For full descriptions of the participating systems, please see the separate system reports elsewhere in this volume.

pdf bib

Proceedings of the First Workshop on Multilingual Surface Realisation
Simon Mille | Anja Belz | Bernd Bohnet | Emily Pitler | Leo Wanner
Proceedings of the First Workshop on Multilingual Surface Realisation

pdf bib

Generation of a Spanish Artificial Collocation Error Corpus
Sara Rodríguez-Fernández | Roberto Carlini | Leo Wanner
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

Compilation of Corpora for the Study of the Information Structure–Prosody Interface
Alicia Burga | Mónica Domínguez | Mireia Farrús | Leo Wanner
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs

Sentence Packaging in Text Generation from Semantic Graphs as a Community Detection Problem
Alexander Shvets | Simon Mille | Leo Wanner
Proceedings of the 11th International Conference on Natural Language Generation

An increasing amount of research tackles the challenge of text generation from abstract ontological or semantic structures, which are in their very nature potentially large connected graphs. These graphs must be “packaged” into sentence-wise subgraphs. We interpret the problem of sentence packaging as a community detection problem with post optimization. Experiments on the texts of the VerbNet/FrameNet structure annotated-Penn Treebank, which have been converted into graphs by a coreference merge using Stanford CoreNLP, show a high F1-score of 0.738.

2017

pdf bib abs

On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification
Juan Soler-Company | Leo Wanner
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

The majority of approaches to author profiling and author identification focus mainly on lexical features, i.e., on the content of a text. We argue that syntactic and discourse features play a significantly more prominent role than they were given in the past. We show that they achieve state-of-the-art performance in author and gender identification on a literary corpus while keeping the feature set small: the used feature set is composed of only 188 features and still outperforms the winner of the PAN 2014 shared task on author verification in the literary genre.

pdf bib abs

Automatic Extraction of Parallel Speech Corpora from Dubbed Movies
Alp Öktem | Mireia Farrús | Leo Wanner
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

This paper presents a methodology to extract parallel speech corpora based on any language pair from dubbed movies, together with an application framework in which some corresponding prosodic parameters are extracted. The obtained parallel corpora are especially suitable for speech-to-speech translation applications when a prosody transfer between source and target languages is desired.

pdf bib abs

FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers
Simon Mille | Roberto Carlini | Alicia Burga | Leo Wanner
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We present the contribution of Universitat Pompeu Fabra’s NLP group to the SemEval Task 9.2 (AMR-to-English Generation). The proposed generation pipeline comprises: (i) a series of rule-based graph-transducers for the syntacticization of the input graphs and the resolution of morphological agreements, and (ii) an off-the-shelf statistical linearization component.

pdf bib abs

Shared Task Proposal: Multilingual Surface Realization Using Universal Dependency Trees
Simon Mille | Bernd Bohnet | Leo Wanner | Anja Belz
Proceedings of the 10th International Conference on Natural Language Generation

We propose a shared task on multilingual Surface Realization, i.e., on mapping unordered and uninflected universal dependency trees to correctly ordered and inflected sentences in a number of languages. A second deeper input will be available in which, in addition, functional words, fine-grained PoS and morphological information will be removed from the input trees. The first shared task on Surface Realization was carried out in 2011 with a similar setup, with a focus on English. We think that it is time for relaunching such a shared task effort in view of the arrival of Universal Dependencies annotated treebanks for a large number of languages on the one hand, and the increasing dominance of Deep Learning, which proved to be a game changer for NLP, on the other hand.

pdf bib

Revising the METU-Sabancı Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages
Alicia Burga | Alp Öktem | Leo Wanner
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib abs

A demo of FORGe: the Pompeu Fabra Open Rule-based Generator
Simon Mille | Leo Wanner
Proceedings of the 10th International Conference on Natural Language Generation

This demo paper presents the multilingual deep sentence generator developed by the TALN group at Universitat Pompeu Fabra, implemented as a series of rule-based graph-transducers for the syntacticization of the input graphs, the resolution of morphological agreements, and the linearization of the trees.

2016

pdf bib abs

An Automatic Prosody Tagger for Spontaneous Speech
Mónica Domínguez | Mireia Farrús | Leo Wanner
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Speech prosody is known to be central in advanced communication technologies. However, despite the advances of theoretical studies in speech prosody, so far, no large scale prosody annotated resources that would facilitate empirical research and the development of empirical computational approaches are available. This is to a large extent due to the fact that current common prosody annotation conventions offer a descriptive framework of intonation contours and phrasing based on labels. This makes it difficult to reach a satisfactory inter-annotator agreement during the annotation of gold standard annotations and, subsequently, to create consistent large scale annotations. To address this problem, we present an annotation schema for prominence and boundary labeling of prosodic phrases based upon acoustic parameters and a tagger for prosody annotation at the prosodic phrase level. Evaluation proves that inter-annotator agreement reaches satisfactory values, from 0.60 to 0.80 Cohen’s kappa, while the prosody tagger achieves acceptable recall and f-measure figures for five spontaneous samples used in the evaluation of monologue and dialogue formats in English and Spanish. The work presented in this paper is a first step towards a semi-automatic acquisition of large corpora for empirical prosodic analysis.

pdf bib abs

Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning
Luis Espinosa-Anke | Jose Camacho-Collados | Sara Rodríguez-Fernández | Horacio Saggion | Leo Wanner
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important type of relation which could potentially add enormous value to WordNet is the inclusion of collocational information, which is paramount in tasks such as Machine Translation, Natural Language Generation and Second Language Learning. In this paper, we present ColWordNet (CWN), an extended WordNet version with fine-grained collocational information, automatically introduced thanks to a method exploiting linear relations between analogous sense-level embeddings spaces. We perform both intrinsic and extrinsic evaluations, and release CWN for the use and scrutiny of the community.

pdf bib

A Neural Network Architecture for Multilingual Punctuation Generation
Miguel Ballesteros | Leo Wanner
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib abs

Example-based Acquisition of Fine-grained Collocation Resources
Sara Rodríguez-Fernández | Roberto Carlini | Luis Espinosa Anke | Leo Wanner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Collocations such as “heavy rain” or “make [a] decision”, are combinations of two elements where one (the base) is freely chosen, while the choice of the other (collocate) is restricted, depending on the base. Collocations present difficulties even to advanced language learners, who usually struggle to find the right collocate to express a particular meaning, e.g., both “heavy” and “strong” express the meaning ‘intense’, but while “rain” selects “heavy”, “wind” selects “strong”. Lexical Functions (LFs) describe the meanings that hold between the elements of collocations, such as ‘intense’, ‘perform’, ‘create’, ‘increase’, etc. Language resources with semantically classified collocations would be of great help for students, however they are expensive to build, since they are manually constructed, and scarce. We present an unsupervised approach to the acquisition and semantic classification of collocations according to LFs, based on word embeddings in which, given an example of a collocation for each of the target LFs and a set of bases, the system retrieves a list of collocates for each base and LF.

pdf bib abs

Praat on the Web: An Upgrade of Praat for Semi-Automatic Speech Annotation
Mónica Domínguez | Iván Latorre | Mireia Farrús | Joan Codina-Filbà | Leo Wanner
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents an implementation of the widely used speech analysis tool Praat as a web application with an extended functionality for feature annotation. In particular, Praat on the Web addresses some of the central limitations of the original Praat tool and provides (i) enhanced visualization of annotations in a dedicated window for feature annotation at interval and point segments, (ii) a dynamic scripting composition exemplified with a modular prosody tagger, and (iii) portability and an operational web interface. Speech annotation tools with such a functionality are key for exploring large corpora and designing modular pipelines.

pdf bib

Semantics-Driven Recognition of Collocations Using Word Embeddings
Sara Rodríguez-Fernández | Luis Espinosa-Anke | Roberto Carlini | Leo Wanner
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib abs

Towards Multiple Antecedent Coreference Resolution in Specialized Discourse
Alicia Burga | Sergio Cajal | Joan Codina-Filbà | Leo Wanner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Despite the popularity of coreference resolution as a research topic, the overwhelming majority of the work in this area focused so far on single antecedence coreference only. Multiple antecedent coreference (MAC) has been largely neglected. This can be explained by the scarcity of the phenomenon of MAC in generic discourse. However, in specialized discourse such as patents, MAC is very dominant. It seems thus unavoidable to address the problem of MAC resolution in the context of tasks related to automatic patent material processing, among them abstractive summarization, deep parsing of patents, construction of concept maps of the inventions, etc. We present the first version of an operational rule-based MAC resolution strategy for patent material that covers the three major types of MAC: (i) nominal MAC, (ii) MAC with personal / relative pronouns, and MAC with reflexive / reciprocal pronouns. The evaluation shows that our strategy performs well in terms of precision and recall.

pdf bib abs

A Semi-Supervised Approach for Gender Identification
Juan Soler | Leo Wanner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70% with only 113 instances to train the model. It is also shown that the algorithm also performs well using publicly available data.

2015

pdf bib

Towards a multi-layered dependency annotation of Finnish
Alicia Burga | Simon Mille | Anton Granvik | Leo Wanner
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib

Classification of Lexical Collocation Errors in the Writings of Learners of Spanish
Sara Rodríguez-Fernández | Roberto Carlini | Leo Wanner
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib

Data-driven sentence generation with non-isomorphic trees
Miguel Ballesteros | Bernd Bohnet | Simon Mille | Leo Wanner
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib

Visualizing Deep-Syntactic Parser Output
Juan Soler-Company | Miguel Ballesteros | Bernd Bohnet | Simon Mille | Leo Wanner
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2014

pdf bib abs

How to Use less Features and Reach Better Performance in Author Gender Identification
Juan Soler Company | Leo Wanner
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Over the last years, author profiling in general and author gender identification in particular have become a popular research area due to their potential attractive applications that range from forensic investigations to online marketing studies. However, nearly all state-of-the-art works in the area still very much depend on the datasets they were trained and tested on, since they heavily draw on content features, mostly a large number of recurrent words or combinations of words extracted from the training sets. We show that using a small number of features that mainly depend on the structure of the texts we can outperform other approaches that depend mainly on the content of the texts and that use a huge number of features in the process of identifying if the author of a text is a man or a woman. Our system has been tested against a dataset constructed for our work as well as against two datasets that were previously used in other papers.

pdf bib

Deep-Syntactic Parsing
Miguel Ballesteros | Bernd Bohnet | Simon Mille | Leo Wanner
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib

Classifiers for data-driven deep sentence generation
Miguel Ballesteros | Simon Mille | Leo Wanner
Proceedings of the 8th International Natural Language Generation Conference (INLG)

pdf bib

Improving Collocation Correction by Ranking Suggestions Using Linguistic Knowledge
Roberto Carlini | Joan Codina-Filba | Leo Wanner
Proceedings of the third workshop on NLP for computer-assisted language learning

pdf bib abs

An Exercise in Reuse of Resources: Adapting General Discourse Coreference Resolution for Detecting Lexical Chains in Patent Documentation
Nadjet Bouayad-Agha | Alicia Burga | Gerard Casamayor | Joan Codina | Rogelio Nazar | Leo Wanner
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Stanford Coreference Resolution System (StCR) is a multi-pass, rule-based system that scored best in the CoNLL 2011 shared task on general discourse coreference resolution. We describe how the StCR has been adapted to the specific domain of patents and give some cues on how it can be adapted to other domains. We present a linguistic analysis of the patent domain and how we were able to adapt the rules to the domain and to expand coreferences with some lexical chains. A comparative evaluation shows an improvement of the coreference resolution system, denoting that (i) StCR is a valuable tool across different text genres; (ii) specialized discourse NLP may significantly benefit from general discourse NLP research.

2013

pdf bib

AnCora-UPF: A Multi-Level Annotation of Spanish
Simon Mille | Alicia Burga | Leo Wanner
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

pdf bib

Towards the Annotation of Penn TreeBank with Information Structure
Bernd Bohnet | Alicia Burga | Leo Wanner
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib

Overview of the First Content Selection Challenge from Open Semantic Web Data
Nadjet Bouayad-Agha | Gerard Casamayor | Leo Wanner | Chris Mellish
Proceedings of the 14th European Workshop on Natural Language Generation

pdf bib

Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)
Eva Hajičová | Kim Gerdes | Leo Wanner
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib

How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance?
Simon Mille | Alicia Burga | Gabriela Ferraro | Leo Wanner
Proceedings of COLING 2012: Posters

pdf bib

The Surface Realisation Task: Recent Developments and Future Plans
Anja Belz | Bernd Bohnet | Simon Mille | Leo Wanner | Michael White
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib

Content Selection From Semantic Web Data
Nadjet Bouayad-Agha | Gerard Casamayor | Leo Wanner | Chris Mellish
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib

Towards a Surface Realization-Oriented Corpus Annotation
Leo Wanner | Simon Mille | Bernd Bohnet
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

2011

pdf bib

<StuMaBa>: From Deep Representation to Surface
Bernd Bohnet | Simon Mille | Benoît Favre | Leo Wanner
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib

Content selection from an ontology-based knowledge base for the generation of football summaries
Nadjet Bouayad-Agha | Gerard Casamayor | Leo Wanner
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib abs

Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora
Margarita Alonso Ramos | Leo Wanner | Orsolya Vincze | Gerard Casamayor del Bosque | Nancy Vázquez Veiga | Estela Mosqueira Suárez | Sabela Prieto González
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Collocations play a significant role in second language acquisition. In order to be able to offer efficient support to learners, an NLP-based CALL environment for learning collocations should be based on a representative collocation error annotated learner corpus. However, so far, no theoretically-motivated collocation error tag set is available. Existing learner corpora tag collocation errors simply as lexical errors ― which is clearly insufficient given the wide range of different collocation errors that the learners make. In this paper, we present a fine-grained three-dimensional typology of collocation errors that has been derived in an empirical study from the learner corpus CEDEL2 compiled by a team at the Autonomous University of Madrid. The first dimension captures whether the error concerns the collocation as a whole or one of its elements; the second dimension captures the language-oriented error analysis, while the third exemplifies the interpretative error analysis. To facilitate a smooth annotation along this typology, we adapted Knowtator, a flexible off-the-shelf annotation tool implemented as a Protégé plugin.

pdf bib abs

Syntactic Dependencies for Multilingual and Multilevel Corpus Annotation
Simon Mille | Leo Wanner
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The relevance of syntactic dependency annotated corpora is nowadays unquestioned. However, a broad debate on the optimal set of dependency relation tags did not take place yet. As a result, largely varying tag sets of a largely varying size are used in different annotation initiatives. We propose a hierarchical dependency structure annotation schema that is more detailed and more flexible than the known annotation schemata. The schema allows us to choose the level of the desired detail of annotation, which facilitates the use of the schema for corpus annotation for different languages and for different NLP applications. Thanks to the inclusion of semantico-syntactic tags into the schema, we can annotate a corpus not only with syntactic dependency structures, but also with valency patterns as they are usually found in separate treebanks such as PropBank and NomBank. Semantico-syntactic tags and the level of detail of the schema furthermore facilitate the derivation of deep-syntactic and semantic annotations, leading to truly multilevel annotated dependency corpora. Such multilevel annotations can be readily used for the task of ML-based acquisition of grammar resources that map between the different levels of linguistic representation ― something which forms part of, for instance, any natural language text generator.

pdf bib

Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer
Bernd Bohnet | Leo Wanner | Simon Mille | Alicia Burga
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib abs

Open Soucre Graph Transducer Interpreter and Grammar Development Environment
Bernd Bohnet | Leo Wanner
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Graph and tree transducers have been applied in many NLP areas―among them, machine translation, summarization, parsing, and text generation. In particular, the successful use of tree rewriting transducers for the introduction of syntactic structures in statistical machine translation contributed to their popularity. However, the potential of such transducers is limited because they do not handle graphs and because they consume the source structure in that they rewrite it instead of leaving it intact for intermediate consultations. In this paper, we describe an open source tree and graph transducer interpreter, which combines the advantages of graph transducers and two-tape Finite State Transducers and surpasses the limitations of state-of-the-art tree rewriting transducers. Along with the transducer, we present a graph grammar development environment that supports the compilation and maintenance of graph transducer grammatical and lexical resources. Such an environment is indispensable for any effort to create consistent large coverage NLP-resources by human experts.

2008

pdf bib

Multilingual summarization in practice: the case of patent claims
Simon Mille | Leo Wanner
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

pdf bib abs

Using Semantically Annotated Corpora to Build Collocation Resources
Margarita Alonso Ramos | Owen Rambow | Leo Wanner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present an experiment in extracting collocations from the FrameNet corpus, specifically, support verbs such as direct in Environmentalists directed strong criticism at world leaders. Support verbs do not contribute meaning of their own and the meaning of the construction is provided by the noun; the recognition of support verbs is thus useful in text understanding. Having access to a list of support verbs is also useful in applications that can benefit from paraphrasing, such as generation (where paraphrasing can provide variety). This paper starts with a brief presentation of the notion of lexical function in Meaning-Text Theory, where they fall under the notion of lexical function, and then discusses how relevant information is encoded in the FrameNet corpus. We describe the resource extracted from the FrameNet corpus.

pdf bib abs

Making Text Resources Accessible to the Reader: the Case of Patent Claims
Simon Mille | Leo Wanner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Hardly any other kind of text structures is as notoriously difficult to read as patents. This is first of all due to their abstract vocabulary and their very complex syntactic constructions. Especially the claims in a patent are a challenge: in accordance with international patent writing regulations, each claim must be rendered in a single sentence. As a result, sentences with more than 200 words are not uncommon. Therefore, paraphrasing of the claims in terms the user can understand is of high demand. We present a rule-based paraphrasing module that realizes paraphrasing of patent claims in English as a rewriting task. Prior to the rewriting proper, the module implies the stages of simplification and discourse and syntactic analyses. The rewriting makes use of a full-fledged text generator and consists in a number of genuine generation tasks such as aggregation, selection of referring expressions, choice of discourse markers and syntactic generation. As generator, we use the MATE-work bench, which is based on the Meaning-Text Theory of linguistics.

pdf bib

Two-step flow in bilingual lexicon extraction from unrelated corpora
Rogelio Nazar | Leo Wanner | Jorge Vivaldi
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

2006

pdf bib abs

Local Document Relevance Clustering in IR Using Collocation Information
Leo Wanner | Margarita Alonso Ramos
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

A series of different automatic query expansion techniques has been suggested in Information Retrieval. To estimate how suitable a document term is as an expansion term, the most popular of them use a measure of the frequency of the co-occurrence of this term with one or several query terms. The benefit of the use of the linguistic relations that hold between query terms is often questioned. If a linguistic phenomenon is taken into account, it is the phrase structure or lexical compound. We propose a technique that is based on the restricted lexical cooccurrence (collocation) of query terms. We use the knowledge on collocations formed by query terms for two tasks: (i) document relevance clustering done in the first stage of local query expansion and (ii) choice of suitable expansion terms from the relevant document cluster. In this paper, we describe the first task, providing evidence from first preliminary experiments on Spanish material that local relevance clustering benefits largely from knowledge on collocations.

Leo Wanner

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2008

2006

2004

2003

2001

2000

1998

1996

1994

1990

Co-authors

Venues