Carla Parra Escartín

Also published as: Carla Parra, Carla Parra Escartin, Carla Parra Escartin, Carla Parra Escartín


2023

pdf bib
PARSEME corpus release 1.3
Agata Savary | Cherifa Ben Khelil | Carlos Ramisch | Voula Giouli | Verginica Barbu Mititelu | Najet Hadj Mohamed | Cvetana Krstev | Chaya Liebeskind | Hongzhi Xu | Sara Stymne | Tunga Güngör | Thomas Pickard | Bruno Guillaume | Eduard Bejček | Archna Bhatia | Marie Candito | Polona Gantar | Uxoa Iñurrieta | Albert Gatt | Jolanta Kovalevskaite | Timm Lichte | Nikola Ljubešić | Johanna Monti | Carla Parra Escartín | Mehrnoush Shamsfard | Ivelina Stoyanova | Veronika Vincze | Abigail Walsh
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.

pdf bib
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Mary Nurminen | Judith Brenner | Maarit Koponen | Sirkku Latomaa | Mikhail Mikhailov | Frederike Schierl | Tharindu Ranasinghe | Eva Vanmassenhove | Sergi Alvarez Vidal | Nora Aranberri | Mara Nunziatini | Carla Parra Escartín | Mikel Forcada | Maja Popovic | Carolina Scarton | Helena Moniz
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

2022

pdf bib
Unsupervised Machine Translation in Real-World Scenarios
Ona de Gibert Bonet | Iakes Goenaga | Jordi Armengol-Estapé | Olatz Perez-de-Viñaspre | Carla Parra Escartín | Marina Sanchez | Mārcis Pinnis | Gorka Labaka | Maite Melero
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning. In the course of the project 18 monolingual corpora for specific domains and languages have been collected, and 12 bilingual dictionaries and translation models have been generated. As part of the research, the unsupervised MT methodology based only on monolingual corpora (Artetxe et al., 2017) has been tested on a variety of languages and domains. Results show that in specialised domains, when there is enough monolingual in-domain data, unsupervised results are comparable to those of general domain supervised translation, and that, at any rate, unsupervised techniques can be used to boost results whenever very little data is available.

2021

pdf bib
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)
Paul Cook | Jelena Mitrović | Carla Parra Escartín | Ashwini Vaidya | Petya Osenova | Shiva Taslimipoor | Carlos Ramisch
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

2020

pdf bib
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
André Martins | Helena Moniz | Sara Fumega | Bruno Martins | Fernando Batista | Luisa Coheur | Carla Parra | Isabel Trancoso | Marco Turchi | Arianna Bisazza | Joss Moorkens | Ana Guerberof | Mary Nurminen | Lena Marg | Mikel L. Forcada
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

2019

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Carla Parra Escartín | Francis Bond | Jelena Mitrović | Verginica Barbu Mititelu
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

2018

pdf bib
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.

2017

pdf bib
Improving Evaluation of Document-level Machine Translation Quality Estimation
Yvette Graham | Qingsong Ma | Timothy Baldwin | Qun Liu | Carla Parra | Carolina Scarton
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable. In this paper, we explore the validity of human annotations currently employed in the evaluation of document-level quality estimation for machine translation (MT). We demonstrate the degree to which MT system rankings are dependent on weights employed in the construction of the gold standard, before proposing direct human assessment as a valid alternative. Experiments show direct assessment (DA) scores for documents to be highly reliable, achieving a correlation of above 0.9 in a self-replication experiment, in addition to a substantial estimated cost reduction through quality controlled crowd-sourcing. The original gold standard based on post-edits incurs a 10–20 times greater cost than DA.

pdf bib
Machine Translation as an Academic Writing Aid for Medical Practitioners
Carla Parra Escartín | Sharon O’Brien | Marie-Josée Goulet | Michel Simard
Proceedings of Machine Translation Summit XVI: Research Track

pdf bib
Ethical Considerations in NLP Shared Tasks
Carla Parra Escartín | Wessel Reijers | Teresa Lynn | Joss Moorkens | Andy Way | Chao-Hong Liu
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Shared tasks are increasingly common in our field, and new challenges are suggested at almost every conference and workshop. However, as this has become an established way of pushing research forward, it is important to discuss how we researchers organise and participate in shared tasks, and make that information available to the community to allow further research improvements. In this paper, we present a number of ethical issues along with other areas of concern that are related to the competitive nature of shared tasks. As such issues could potentially impact on research ethics in the Natural Language Processing community, we also propose the development of a framework for the organisation of and participation in shared tasks that can help mitigate against these issues arising.

2016

pdf bib
Combining Translation Memories and Syntax-Based SMT: Experiments with Real Industrial Data
Liangyou Li | Carla Parra Escartin | Qun Liu
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Semantic Textual Similarity in Quality Estimation
Hanna Bechara | Carla Parra Escartin | Constantin Orasan | Lucia Specia
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
PARSEME Survey on MWE Resources
Gyri Smørdal Losnegaard | Federico Sangati | Carla Parra Escartín | Agata Savary | Sascha Bargmann | Johanna Monti
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multiword datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult to find. In many cases, language resources such as corpora, treebanks, or lexical databases include multiwords as part of their data or take them into account in their annotations. However, these resources need to be centralized to make them accessible. The aim of this survey is to create a portal where researchers can easily find multiword(-aware) language resources for their research. We report on the design of the survey and analyze the data gathered so far. We also discuss the problems we have detected upon examination of the data as well as possible ways of enhancing the survey.

2015

pdf bib
Machine translation evaluation made fuzzier: a study on post-editing productivity and evaluation metrics in commercial settings
Carla Parra Escartín | Manuel Arcedillo
Proceedings of Machine Translation Summit XV: Papers

pdf bib
Living on the edge: productivity gain thresholds in machine translation evaluation metrics
Carla Parra Escartin | Manuel Arcedillo
Proceedings of the 4th Workshop on Post-editing Technology and Practice

pdf bib
A fuzzier approach to machine translation evaluation: A pilot study on post-editing productivity and automated metrics in commercial settings
Carla Parra Escartín | Manuel Arcedillo
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)

pdf bib
Creation of new TM segments: Fulfilling translators’ wishes
Carla Parra Escartín
Proceedings of the Workshop Natural Language Processing for Translation Memories

2014

pdf bib
Chasing the Perfect Splitter: A Comparison of Different Compound Splitting Tools
Carla Parra Escartín
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper reports on the evaluation of two compound splitters for German. Compounding is a very frequent phenomenon in German and thus efficient ways of detecting and correctly splitting compound words are needed for natural language processing applications. This paper presents different strategies for compound splitting, focusing on German. Four compound splitters for German are presented. Two of them were used in Statistical Machine Translation (SMT) experiments, obtaining very similar qualitative scores in terms of BLEU and TER and therefore a thorough evaluation of both has been carried out.

pdf bib
German Compounds and Statistical Machine Translation. Can they get along?
Carla Parra Escartín | Stephan Peitz | Hermann Ney
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

2012

pdf bib
Design and compilation of a specialized Spanish-German parallel corpus
Carla Parra Escartín
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper discusses the design and compilation of the TRIS corpus, a specialized parallel corpus of Spanish and German texts. It will be used for phraseological research aimed at improving statistical machine translation. The corpus is based on the European database of Technical Regulations Information System (TRIS), containing 995 original documents written in German and Spanish and their translations into Spanish and German respectively. This parallel corpus is under development and the first version with 97 aligned file pairs was released in the first META-NORD upload of metadata and resources in November 2011. The second version of the corpus, described in the current paper, contains 205 file pairs which have been completely aligned at sentence level, which account for approximately 1,563,000 words and 70,648 aligned sentence pairs.
Search
Co-authors