Arantxa Otegi


pdf bib
TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively
Izaskun Aldezabal | Jose Mari Arriola | Arantxa Otegi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Terminology databases are highly useful for the dissemination of specialized knowledge. In this paper we present TZOS, an online terminology database to work on Basque academic terminology collaboratively. We show how this resource integrates the Communicative Theory of Terminology, together with the methodological matters, how it is connected with real corpus GARATERM, which terminology issues arise when terms are collected and future perspectives. The main objectives of this work are to develop basic tools to research academic registers and make the terminology collected by expert users available to the community. Even though TZOS has been designed for an educational context, its flexible structure makes possible to extend it also to the professional area. In this way, we have built IZIBI-TZOS which is a Civil Engineering oriented version of TZOS. These resources are already publicly available, and the ongoing work is towards the interlinking with other lexical resources by applying linking data principles.


pdf bib
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
Arantxa Otegi | Aitor Agirre | Jon Ander Campos | Aitor Soroa | Eneko Agirre
Proceedings of the Twelfth Language Resources and Evaluation Conference

Conversational Question Answering (CQA) systems meet user information needs by having conversations with them, where answers to the questions are retrieved from text. There exist a variety of datasets for English, with tens of thousands of training examples, and pre-trained language models have allowed to obtain impressive results. The goal of our research is to test the performance of CQA systems under low-resource conditions which are common for most non-English languages: small amounts of native annotations and other limitations linked to low resource languages, like lack of crowdworkers or smaller wikipedias. We focus on the Basque language, and present the first non-English CQA dataset and results. Our experiments show that it is possible to obtain good results with low amounts of native data thanks to cross-lingual transfer, with quality comparable to those obtained for English. We also discovered that dialogue history models are not directly transferable to another language, calling for further research. The dataset is publicly available.

pdf bib
DoQA - Accessing Domain-Specific FAQs via Conversational QA
Jon Ander Campos | Arantxa Otegi | Aitor Soroa | Jan Deriu | Mark Cieliebak | Eneko Agirre
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The goal of this work is to build conversational Question Answering (QA) interfaces for the large body of domain-specific information available in FAQ sites. We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs. The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing. Compared to previous work, DoQA comprises well-defined information needs, leading to more coherent and natural conversations with less factoid questions and is multi-domain. In addition, we introduce a more realistic information retrieval (IR) scenario where the system needs to find the answer in any of the FAQ documents. The results of an existing, strong, system show that, thanks to transfer learning from a Wikipedia QA dataset and fine tuning on a single FAQ domain, it is possible to build high quality conversational QA systems for FAQs without in-domain training data. The good results carry over into the more challenging IR scenario. In both cases, there is still ample room for improvement, as indicated by the higher human upperbound.

pdf bib
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning
Jon Ander Campos | Kyunghyun Cho | Arantxa Otegi | Aitor Soroa | Eneko Agirre | Gorka Azkune
Proceedings of the 28th International Conference on Computational Linguistics

The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility. In most applications, users are not able to provide the correct answer to the system, but they are able to provide binary (correct, incorrect) feedback. In this paper we propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. We perform simulated experiments on document classification (for development) and Conversational Question Answering datasets like QuAC and DoQA, where binary user feedback is derived from gold annotations. The results show that our method is able to improve over the initial supervised system, getting close to a fully-supervised system that has access to the same labeled examples in in-domain experiments (QuAC), and even matching in out-of-domain experiments (DoQA). Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.

pdf bib
Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature
Arantxa Otegi | Jon Ander Campos | Gorka Azkune | Aitor Soroa | Eneko Agirre
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

We present a Question Answering (QA) system that won one of the tasks of the Kaggle CORD-19 Challenge, according to the qualitative evaluation of experts. The system is a combination of an Information Retrieval module and a reading comprehension module that finds the answers in the retrieved passages. In this paper we present a quantitative and qualitative analysis of the system. The quantitative evaluation using manually annotated datasets contradicted some of our design choices, e.g. the fact that using QuAC for fine-tuning provided better answers over just using SQuAD. We analyzed this mismatch with an additional A/B test which showed that the system using QuAC was indeed preferred by users, confirming our intuition. Our analysis puts in question the suitability of automatic metrics and its correlation to user preferences. We also show that automatic metrics are highly dependent on the characteristics of the gold standard, such as the average length of the answers.


pdf bib
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
Arantxa Otegi | Nora Aranberri | Antonio Branco | Jan Hajič | Martin Popel | Kiril Simov | Eneko Agirre | Petya Osenova | Rita Pereira | João Silva | Steven Neale
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.


pdf bib
Query Expansion for IR using Knowledge-Based Relatedness
Arantxa Otegi | Xabier Arregi | Eneko Agirre
Proceedings of 5th International Joint Conference on Natural Language Processing


pdf bib
Document Expansion Based on WordNet for Robust IR
Eneko Agirre | Xabier Arregi | Arantxa Otegi
Coling 2010: Posters


pdf bib
SemEval-2007 Task 01: Evaluating WSD on Cross-Language Information Retrieval
Eneko Agirre | Bernardo Magnini | Oier Lopez de Lacalle | Arantxa Otegi | German Rigau | Piek Vossen
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)