Iria da Cunha

Also published as: I. da Cunha

2018

The RST Spanish-Chinese Treebank
Shuyuan Cao | Iria da Cunha | Mikel Iruskieta
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Discourse analysis is necessary for different tasks of Natural Language Processing (NLP). As two of the most spoken languages in the world, discourse analysis between Spanish and Chinese is important for NLP research. This paper aims to present the first open Spanish-Chinese parallel corpus annotated with discourse information, whose theoretical framework is based on the Rhetorical Structure Theory (RST). We have evaluated and harmonized each annotation part to obtain a high annotated-quality corpus. The corpus is already available to the public.

2017

pdf bib abs

The arText prototype: An automatic system for writing specialized texts
Iria da Cunha | M. Amor Montané | Luis Hysa
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

This article describes an automatic system for writing specialized texts in Spanish. The arText prototype is a free online text editor that includes different types of linguistic information. It is designed for a variety of end users and domains, including specialists and university students working in the fields of medicine and tourism, and laypersons writing to the public administration. ArText provides guidance on how to structure a text, prompts users to include all necessary contents in each section, and detects lexical and discourse problems in the text.

pdf bib

Discourse Segmentation for Building a RST Chinese Treebank
Shuyuan Cao | Nianwen Xue | Iria da Cunha | Mikel Iruskieta | Chuan Wang
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

pdf bib

2016

pdf bib abs

A Corpus-based Approach for Spanish-Chinese Language Learning
Shuyuan Cao | Iria da Cunha | Mikel Iruskieta
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

Due to the huge population that speaks Spanish and Chinese, these languages occupy an important position in the language learning studies. Although there are some automatic translation systems that benefit the learning of both languages, there is enough space to create resources in order to help language learners. As a quick and effective resource that can give large amount language information, corpus-based learning is becoming more and more popular. In this paper we enrich a Spanish-Chinese parallel corpus automatically with part of-speech (POS) information and manually with discourse segmentation (following the Rhetorical Structure Theory (RST) (Mann and Thompson, 1988)). Two search tools allow the Spanish-Chinese language learners to carry out different queries based on tokens and lemmas. The parallel corpus and the research tools are available to the academic community. We propose some examples to illustrate how learners can use the corpus to learn Spanish and Chinese.

pdf bib

CobaltF: A Fluent Metric for MT Evaluation
Marina Fomicheva | Núria Bel | Lucia Specia | Iria da Cunha | Anton Malinovskiy
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

This paper presents a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those present in the document title. The general idea is to obtain a relevance score for each sentence taking into account both the termhood of the terms found in such sentence and the similarity among such terms and those terms present in the title of the document. The phrases with the highest score are chosen to take part of the final summary. We evaluate the algorithm with Rouge, comparing the resulting summaries with the summaries of other summarizers. The sentence selection algorithm was also tested as part of a standalone summarizer. In both cases it obtains quite good results although the perception is that there is a space for improvement.

pdf bib

Multilingual Summarization Evaluation without Human Models
Horacio Saggion | Juan-Manuel Torres-Moreno | Iria da Cunha | Eric SanJuan | Patricia Velázquez-Morales
Coling 2010: Posters

pdf bib abs

Évaluation automatique de résumés avec et sans référence
Juan-Manuel Torres-Moreno | Horacio Saggion | Iria da Cunha | Patricia Velázquez-Morales | Eric Sanjuan
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous étudions différentes méthodes d’évaluation de résumé de documents basées sur le contenu. Nous nous intéressons en particulier à la corrélation entre les mesures d’évaluation avec et sans référence humaine. Nous avons développé FRESA, un nouveau système d’évaluation fondé sur le contenu qui calcule les divergences entre les distributions de probabilité. Nous appliquons notre système de comparaison aux diverses mesures d’évaluation bien connues en résumé de texte telles que la Couverture, Responsiveness, Pyramids et Rouge en étudiant leurs associations dans les tâches du résumé multi-document générique (francais/anglais), focalisé (anglais) et résumé mono-document générique (français/espagnol).

Iria da Cunha

2018

2017

2016

2015

2011

2010

Co-authors

Venues