Olatz Arregi

2020

pdf bib abs
Sequence to Sequence Coreference Resolution
Gorka Urbizu | Ander Soraluze | Olatz Arregi
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

Until recently, coreference resolution has been a critical task on the pipeline of any NLP task involving deep language understanding, such as machine translation, chatbots, summarization or sentiment analysis. However, nowadays, those end tasks are learned end-to-end by deep neural networks without adding any explicit knowledge about coreference. Thus, coreference resolution is used less in the training of other NLP tasks or trending pretrained language models. In this paper we present a new approach to face coreference resolution as a sequence to sequence task based on the Transformer architecture. This approach is simple and universal, compatible with any language or dataset (regardless of singletons) and easier to integrate with current language models architectures. We test it on the ARRAU corpus, where we get 65.6 F1 CoNLL. We see this approach not as a final goal, but a means to pretrain sequence to sequence language models (T5) on coreference resolution.

2019

pdf bib abs
Deep Cross-Lingual Coreference Resolution for Less-Resourced Languages: The Case of Basque
Gorka Urbizu | Ander Soraluze | Olatz Arregi
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference

In this paper, we present a cross-lingual neural coreference resolution system for a less-resourced language such as Basque. To begin with, we build the first neural coreference resolution system for Basque, training it with the relatively small EPEC-KORREF corpus (45,000 words). Next, a cross-lingual coreference resolution system is designed. With this approach, the system learns from a bigger English corpus, using cross-lingual embeddings, to perform the coreference resolution for Basque. The cross-lingual system obtains slightly better results (40.93 F1 CoNLL) than the monolingual system (39.12 F1 CoNLL), without using any Basque language corpus to train it.

2017

pdf bib abs
Enriching Basque Coreference Resolution System using Semantic Knowledge sources
Ander Soraluze | Olatz Arregi | Xabier Arregi | Arantza Díaz de Ilarraza
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

In this paper we present a Basque coreference resolution system enriched with semantic knowledge. An error analysis carried out revealed the deficiencies that the system had in resolving coreference cases in which semantic or world knowledge is needed. We attempt to improve the deficiencies using two semantic knowledge sources, specifically Wikipedia and WordNet.

In this paper, we present the first Semantic Role Labeling system developed for Basque. The system is implemented using machine learning techniques and trained with the Reference Corpus for the Processing of Basque (EPEC). In our experiments the classifier that offers the best results is based on Support Vector Machines. Our system achieves 84.30 F1 score in identifying the PropBank semantic role for a given constituent and 82.90 F1 score in identifying the VerbNet role. Our study establishes a baseline for Basque SRL. Although there are no directly comparable systems for English we can state that the results we have achieved are quite good. In addition, we have performed a Leave-One-Out feature selection procedure in order to establish which features are the worthiest regarding argument classification. This will help smooth the way for future stages of Basque SRL and will help draw some of the guidelines of our research.