Mathias Etcheverry


2020

pdf bib
Supervised Hypernymy Detection in Spanish through Order Embeddings
Gun Woo Lee | Mathias Etcheverry | Daniel Fernandez Sanchez | Dina Wonsever
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

This paper addresses the task of supervised hypernymy detection in Spanish through an order embedding and using pretrained word vectors as input. Although the task has been widely addressed in English, there is not much work in Spanish, and according to our knowledge there is not any available dataset for supervised hypernymy detection in Spanish. We built a supervised hypernymy dataset for Spanish from WordNet and corpus statistics information, with different versions according to the lexical intersection between its partitions: random and lexical split. We show the results of using the resulting dataset within an order embedding consuming pretrained word vectors as input. We show the ability of pretrained word vectors to transfer learning to unseen lexical units according to the results in the lexical split dataset. To finish, we study the results of giving additional information in training time, such as, cohyponym links and instances extracted through patterns.

2019

pdf bib
Unraveling Antonym’s Word Vectors through a Siamese-like Network
Mathias Etcheverry | Dina Wonsever
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Discriminating antonyms and synonyms is an important NLP task that has the difficulty that both, antonyms and synonyms, contains similar distributional information. Consequently, pairs of antonyms and synonyms may have similar word vectors. We present an approach to unravel antonymy and synonymy from word vectors based on a siamese network inspired approach. The model consists of a two-phase training of the same base network: a pre-training phase according to a siamese model supervised by synonyms and a training phase on antonyms through a siamese-like model that supports the antitransitivity present in antonymy. The approach makes use of the claim that the antonyms in common of a word tend to be synonyms. We show that our approach outperforms distributional and pattern-based approaches, relaying on a simple feed forward network as base network of the training phases.

2016

pdf bib
Spanish Word Vectors from Wikipedia
Mathias Etcheverry | Dina Wonsever
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Contents analisys from text data requires semantic representations that are difficult to obtain automatically, as they may require large handcrafted knowledge bases or manually annotated examples. Unsupervised autonomous methods for generating semantic representations are of greatest interest in face of huge volumes of text to be exploited in all kinds of applications. In this work we describe the generation and validation of semantic representations in the vector space paradigm for Spanish. The method used is GloVe (Pennington, 2014), one of the best performing reported methods , and vectors were trained over Spanish Wikipedia. The learned vectors evaluation is done in terms of word analogy and similarity tasks (Pennington, 2014; Baroni, 2014; Mikolov, 2013a). The vector set and a Spanish version for some widely used semantic relatedness tests are made publicly available.