2019
pdf
bib
abs
Unsupervised Compositionality Prediction of Nominal Compounds
Silvio Cordeiro
|
Aline Villavicencio
|
Marco Idiart
|
Carlos Ramisch
Computational Linguistics, Volume 45, Issue 1 - March 2019
Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.
pdf
bib
abs
Without lexicons, multiword expression identification will never fly: A position statement
Agata Savary
|
Silvio Cordeiro
|
Carlos Ramisch
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Because most multiword expressions (MWEs), especially verbal ones, are semantically non-compositional, their automatic identification in running text is a prerequisite for semantically-oriented downstream applications. However, recent developments, driven notably by the PARSEME shared task on automatic identification of verbal MWEs, show that this task is harder than related tasks, despite recent contributions both in multilingual corpus annotation and in computational models. In this paper, we analyse possible reasons for this state of affairs. They lie in the nature of the MWE phenomenon, as well as in its distributional properties. We also offer a comparative analysis of the state-of-the-art systems, which exhibit particularly strong sensitivity to unseen data. On this basis, we claim that, in order to make strong headway in MWE identification, the community should bend its mind into coupling identification of MWEs with their discovery, via syntactic MWE lexicons. Such lexicons need not necessarily achieve a linguistically complete modelling of MWEs’ behavior, but they should provide minimal morphosyntactic information to cover some potential uses, so as to complement existing MWE-annotated corpora. We define requirements for such minimal NLP-oriented lexicon, and we propose a roadmap for the MWE community driven by these requirements.
pdf
bib
abs
Syntax-based identification of light-verb constructions
Silvio Ricardo Cordeiro
|
Marie Candito
Proceedings of the 22nd Nordic Conference on Computational Linguistics
This paper analyzes results on light-verb construction identification from the PARSEME shared-task, distinguishing between simple cases that could be directly learned from training data from more complex cases that require an extra level of semantic processing. We propose a simple baseline that beats the state of the art for the simple cases, and couple it with another simple baseline to handle the complex cases. We additionally present two other classifiers based on a richer set of features, with results surpassing the state of the art by 8 percentage points.
2018
pdf
bib
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Silvio Ricardo Cordeiro
|
Shereen Oraby
|
Umashanthi Pavalanathan
|
Kyeongmin Rim
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
pdf
bib
abs
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch
|
Silvio Ricardo Cordeiro
|
Agata Savary
|
Veronika Vincze
|
Verginica Barbu Mititelu
|
Archna Bhatia
|
Maja Buljan
|
Marie Candito
|
Polona Gantar
|
Voula Giouli
|
Tunga Güngör
|
Abdelati Hawwari
|
Uxoa Iñurrieta
|
Jolanta Kovalevskaitė
|
Simon Krek
|
Timm Lichte
|
Chaya Liebeskind
|
Johanna Monti
|
Carla Parra Escartín
|
Behrang QasemiZadeh
|
Renata Ramisch
|
Nathan Schneider
|
Ivelina Stoyanova
|
Ashwini Vaidya
|
Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.
2017
pdf
bib
abs
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary
|
Carlos Ramisch
|
Silvio Cordeiro
|
Federico Sangati
|
Veronika Vincze
|
Behrang QasemiZadeh
|
Marie Candito
|
Fabienne Cap
|
Voula Giouli
|
Ivelina Stoyanova
|
Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
pdf
bib
LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds
Rodrigo Wilkens
|
Leonardo Zilio
|
Silvio Ricardo Cordeiro
|
Felipe Paula
|
Carlos Ramisch
|
Marco Idiart
|
Aline Villavicencio
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers
pdf
bib
Literal readings of multiword expressions: as scarce as hen’s teeth
Agata Savary
|
Silvio Ricardo Cordeiro
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
2016
pdf
bib
UFRGS&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging
Silvio Cordeiro
|
Carlos Ramisch
|
Aline Villavicencio
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
pdf
bib
Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments
Carlos Ramisch
|
Silvio Cordeiro
|
Aline Villavicencio
Proceedings of the 12th Workshop on Multiword Expressions
pdf
bib
abs
mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
Silvio Cordeiro
|
Carlos Ramisch
|
Aline Villavicencio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents mwetoolkit+sem: an extension of the mwetoolkit that estimates semantic compositionality scores for multiword expressions (MWEs) based on word embeddings. First, we describe our implementation of vector-space operations working on distributional vectors. The compositionality score is based on the cosine distance between the MWE vector and the composition of the vectors of its member words. Our generic system can handle several types of word embeddings and MWE lists, and may combine individual word representations using several composition techniques. We evaluate our implementation on a dataset of 1042 English noun compounds, comparing different configurations of the underlying word embeddings and word-composition models. We show that our vector-based scores model non-compositionality better than standard association measures such as log-likelihood.
pdf
bib
Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time
Silvio Cordeiro
|
Carlos Ramisch
|
Marco Idiart
|
Aline Villavicencio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality
Carlos Ramisch
|
Silvio Cordeiro
|
Leonardo Zilio
|
Marco Idiart
|
Aline Villavicencio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)