Aina Garí Soler


2021

pdf bib
Scalar Adjective Identification and Multilingual Ranking
Aina Garí Soler | Marianna Apidianaki
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The intensity relationship that holds between scalar adjectives (e.g., nice < great < wonderful) is highly relevant for natural language inference and common-sense reasoning. Previous research on scalar adjective ranking has focused on English, mainly due to the availability of datasets for evaluation. We introduce a new multilingual dataset in order to promote research on scalar adjectives in new languages. We perform a series of experiments and set performance baselines on this dataset, using monolingual and multilingual contextual language models. Additionally, we introduce a new binary classification task for English scalar adjective identification which examines the models’ ability to distinguish scalar from relational adjectives. We probe contextualised representations and report baseline results for future comparison on this task.

2020

pdf bib
BERT Knows Punta Cana is not just beautiful, it’s gorgeous: Ranking Scalar Adjectives with Contextualised Representations
Aina Garí Soler | Marianna Apidianaki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Adjectives like pretty, beautiful and gorgeous describe positive properties of the nouns they modify but with different intensity. These differences are important for natural language understanding and reasoning. We propose a novel BERT-based approach to intensity detection for scalar adjectives. We model intensity by vectors directly derived from contextualised representations and show they can successfully rank scalar adjectives. We evaluate our models both intrinsically, on gold standard datasets, and on an Indirect Question Answering task. Our results demonstrate that BERT encodes rich knowledge about the semantics of scalar adjectives, and is able to provide better quality intensity rankings than static embeddings and previous models with access to dedicated resources.

pdf bib
MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning
Aina Garí Soler | Marianna Apidianaki
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the MULTISEM systems submitted to SemEval 2020 Task 3: Graded Word Similarity in Context (GWSC). We experiment with injecting semantic knowledge into pre-trained BERT models through fine-tuning on lexical semantic tasks related to GWSC. We use existing semantically annotated datasets, and propose to approximate similarity through automatically generated lexical substitutes in context. We participate in both GWSC subtasks and address two languages, English and Finnish. Our best English models occupy the third and fourth positions in the ranking for the two subtasks. Performance is lower for the Finnish models which are mid-ranked in the respective subtasks, highlighting the important role of data availability for fine-tuning.

2019

pdf bib
A Comparison of Context-sensitive Models for Lexical Substitution
Aina Garí Soler | Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Word embedding representations provide good estimates of word meaning and give state-of-the art performance in semantic tasks. Embedding approaches differ as to whether and how they account for the context surrounding a word. We present a comparison of different word and context representations on the task of proposing substitutes for a target word in context (lexical substitution). We also experiment with tuning contextualized word embeddings on a dataset of sense-specific instances for each target word. We show that powerful contextualized word representations, which give high performance in several semantics-related tasks, deal less well with the subtle in-context similarity relationships needed for substitution. This is better handled by models trained with this objective in mind, where the inter-dependence between word and context representations is explicitly modeled during training.

pdf bib
LIMSI-MULTISEM at the IJCAI SemDeep-5 WiC Challenge: Context Representations for Word Usage Similarity Estimation
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib
Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Usage similarity estimation addresses the semantic proximity of word instances in different contexts. We apply contextualized (ELMo and BERT) word and sentence embeddings to this task, and propose supervised models that leverage these representations for prediction. Our models are further assisted by lexical substitute annotations automatically assigned to word instances by context2vec, a neural model that relies on a bidirectional LSTM. We perform an extensive comparison of existing word and sentence representations on benchmark datasets addressing both graded and binary similarity.The best performing models outperform previous methods in both settings.

pdf bib
Exploring sentence informativeness
Syrielle Montariol | Aina Garí Soler | Alexandre Allauzen
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

This study is a preliminary exploration of the concept of informativeness –how much information a sentence gives about a word it contains– and its potential benefits to building quality word representations from scarce data. We propose several sentence-level classifiers to predict informativeness, and we perform a manual annotation on a set of sentences. We conclude that these two measures correspond to different notions of informativeness. However, our experiments show that using the classifiers’ predictions to train word embeddings has an impact on embedding quality.

2018

pdf bib
A comparative study of word embeddings and other features for lexical complexity detection in French
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.