Statistical Deep Parsing for Spanish Using Neural Networks
Luis Chiruzzo
Dina Wonsever
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
This paper presents the development of a deep parser for Spanish that uses a HPSG grammar and returns trees that contain both syntactic and semantic information. The parsing process uses a top-down approach implemented using LSTM neural networks, and achieves good performance results in terms of syntactic constituency and dependency metrics, and also SRL. We describe the grammar, corpus and implementation of the parser. Our process outperforms a CKY baseline and other Spanish parsers in terms of global metrics and also for some specific Spanish phenomena, such as clitics reduplication and relative referents.
Supervised Hypernymy Detection in Spanish through Order Embeddings
Gun Woo Lee
Mathias Etcheverry
Daniel Fernandez Sanchez
Dina Wonsever
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)
This paper addresses the task of supervised hypernymy detection in Spanish through an order embedding and using pretrained word vectors as input. Although the task has been widely addressed in English, there is not much work in Spanish, and according to our knowledge there is not any available dataset for supervised hypernymy detection in Spanish. We built a supervised hypernymy dataset for Spanish from WordNet and corpus statistics information, with different versions according to the lexical intersection between its partitions: random and lexical split. We show the results of using the resulting dataset within an order embedding consuming pretrained word vectors as input. We show the ability of pretrained word vectors to transfer learning to unseen lexical units according to the results in the lexical split dataset. To finish, we study the results of giving additional information in training time, such as, cohyponym links and instances extracted through patterns.
Unraveling Antonym’s Word Vectors through a Siamese-like Network
Mathias Etcheverry
Dina Wonsever
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Discriminating antonyms and synonyms is an important NLP task that has the difficulty that both, antonyms and synonyms, contains similar distributional information. Consequently, pairs of antonyms and synonyms may have similar word vectors. We present an approach to unravel antonymy and synonymy from word vectors based on a siamese network inspired approach. The model consists of a two-phase training of the same base network: a pre-training phase according to a siamese model supervised by synonyms and a training phase on antonyms through a siamese-like model that supports the antitransitivity present in antonymy. The approach makes use of the claim that the antonyms in common of a word tend to be synonyms. We show that our approach outperforms distributional and pattern-based approaches, relaying on a simple feed forward network as base network of the training phases.
Using Context to Improve the Spanish WordNet Translation
Alfonso Methol
Guillermo López
Juan Álvarez
Luis Chiruzzo
Dina Wonsever
Proceedings of the 9th Global Wordnet Conference
We present some strategies for improving the Spanish version of WordNet, part of the MCR, selecting new lemmas for the Spanish synsets by translating the lemmas of the corresponding English synsets. We used four simple selectors that resulted in a considerable improvement of the Spanish WordNet coverage, but with relatively lower precision, then we defined two context based selectors that improved the precision of the translations.
Spanish HPSG Treebank based on the AnCora Corpus
Luis Chiruzzo
Dina Wonsever
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Some strategies for the improvement of a Spanish WordNet
Matias Herrera
Javier Gonzalez
Luis Chiruzzo
Dina Wonsever
Proceedings of the 8th Global WordNet Conference (GWC)
Although there are currently several versions of Princeton WordNet for different languages, the lack of development of some of these versions does not make it possible to use them in different Natural Language Processing applications. So is the case of the Spanish Wordnet contained in the Multilingual Central Repository (MCR), which we tried unsuccessfully to incorporate into an anaphora resolution application and also in search terms expansion. In this situation, different strategies to improve MCR Spanish WordNet coverage were put forward and tested, obtaining encouraging results. A specific process was conducted to increase the number of adverbs, and a few simple processes were applied which made it possible to increase, at a very low cost, the number of terms in the Spanish WordNet. Finally, a more complex method based on distributional semantics was proposed, using the relations between English Wordnet synsets, also returning positive results.
Factuality Annotation and Learning in Spanish Texts
Dina Wonsever
Aiala Rosá
Marisa Malcuori
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.
Spanish Word Vectors from Wikipedia
Mathias Etcheverry
Dina Wonsever
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Contents analisys from text data requires semantic representations that are difficult to obtain automatically, as they may require large handcrafted knowledge bases or manually annotated examples. Unsupervised autonomous methods for generating semantic representations are of greatest interest in face of huge volumes of text to be exploited in all kinds of applications. In this work we describe the generation and validation of semantic representations in the vector space paradigm for Spanish. The method used is GloVe (Pennington, 2014), one of the best performing reported methods , and vectors were trained over Spanish Wikipedia. The learned vectors evaluation is done in terms of word analogy and similarity tasks (Pennington, 2014; Baroni, 2014; Mikolov, 2013a). The vector set and a Spanish version for some widely used semantic relatedness tests are made publicly available.
Adaptation of a Rule-Based Translator to Río de la Plata Spanish
Ernesto López
Luis Chiruzzo
Dina Wonsever
Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants
Improving Speculative Language Detection using Linguistic Knowledge
Guillermo Moncecchi
Jean-Luc Minel
Dina Wonsever
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
Opinion Identification in Spanish Texts
Aiala Rosá
Dina Wonsever
Jean-Luc Minel
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas