Asunción Moreno

Also published as: A. Moreno, Asuncion Moreno, Asuncíon Moreno


2020

pdf bib
Proceedings of the Twelfth Language Resources and Evaluation Conference
Nicoletta Calzolari | Frédéric Béchet | Philippe Blache | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Twelfth Language Resources and Evaluation Conference

pdf bib
Abusive language in Spanish children and young teenager’s conversations: data preparation and short text classification with contextual word embeddings
Marta R. Costa-jussà | Esther González | Asuncion Moreno | Eudald Cumalat
Proceedings of the Twelfth Language Resources and Evaluation Conference

Abusive texts are reaching the interests of the scientific and social community. How to automatically detect them is onequestion that is gaining interest in the natural language processing community. The main contribution of this paper is toevaluate the quality of the recently developed ”Spanish Database for cyberbullying prevention” for the purpose of trainingclassifiers on detecting abusive short texts. We compare classical machine learning techniques to the use of a more ad-vanced model: the contextual word embeddings in the particular case of classification of abusive short-texts for the Spanishlanguage. As contextual word embeddings, we use Bidirectional Encoder Representation from Transformers (BERT), pro-posed at the end of 2018. We show that BERT mostly outperforms classical techniques. Far beyond the experimentalimpact of our research, this project aims at planting the seeds for an innovative technological tool with a high potentialsocial impact and aiming at being part of the initiatives in artificial intelligence for social good.

2018

bib
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Nicoletta Calzolari | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Koiti Hasida | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis | Takenobu Tokunaga
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

bib
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Sara Goggi | Marko Grobelnik | Bente Maegaard | Joseph Mariani | Helene Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

2014

bib
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Hrafn Loftsson | Bente Maegaard | Joseph Mariani | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.

2012

bib
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Mehmet Uğur Doğan | Bente Maegaard | Joseph Mariani | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

pdf bib
BUCEADOR, a multi-language search engine for digital libraries
Jordi Adell | Antonio Bonafonte | Antonio Cardenal | Marta R. Costa-Jussà | José A. R. Fonollosa | Asunción Moreno | Eva Navas | Eduardo R. Banga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.

pdf bib
Building Synthetic Voices in the META-NET Framework
Emília Garcia Casademont | Antonio Bonafonte | Asunción Moreno
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

METANET4U is a European project aiming at supporting language technology for European languages and multilingualism. It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META, which is the Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information society. This paper describe the resources produced at our lab to provide Synthethic voices. Using existing 10h corpus for a male and a female Spanish speakers, voices have been developed to be used in Festival, both with unit-selection and with statistical-based technologies. Furthermore, using data produced for supporting research on intra and inter-lingual voice conversion, four bilingual voices (English/Spanish) have been developed. The paper describes these resources which are available through META. Furthermore, an evaluation is presented to compare different synthesis techniques, influence of amount of data in statistical speech synthesis and the effect of sharing data in bilingual voices.

2008

pdf bib
LC-STAR II: Starring more Lexica
Ute Ziegenhain | Hanne Fersoe | Henk van den Heuvel | Asuncion Moreno
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

LC-STAR II is a follow-up project of the EU funded project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components, IST-2001-32216). LC-STAR II develops large lexica containing information for speech processing in ten languages targeting especially automatic speech recognition and text to speech synthesis but also other applications like speech-to-speech translation and tagging. The project follows by large the specifications developed within the scope of LC-STAR covering thirteen languages: Catalan, Finnish, German, Greek, Hebrew, Italian, Mandarin Chinese, Russian, Turkish, Slovenian, Spanish, Standard Arabic and US-English. The ten new LC-STAR II languages are: Brazilian-Portuguese, Cantonese, Czech, English-UK, French, Hindi, Polish, Portuguese, Slovak, and Urdu. The project started in 2006 with a lifetime of two years. The project is funded by a consortium, which includes Microsoft (USA), Nokia (Finland), NSC (Israel), Siemens (Germany) and Harmann/Becker (Germany). The project is coordinated by UPC (Spain) and validation is performed by SPEX (The Netherlands), and CST (Denmark). The developed language resources will be shared among partners.This paper presents a summary of the creation of word lists and lexica and an overview of adaptations of the specifications and conceptual representation model from LC-STAR to the new languages. The validation procedure will be presented too.

pdf bib
LILA: Cellular Telephone Speech Databases from Asia
Eric Sanders | Asuncion Moreno | Herbert Tropf | Lynette Melnar | Nurit Dekel | Breanna Gillies | Niklas Paulsson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The goal of the LILA project was the collection of speech databases over cellular telephone networks of five languages in three Asian countries. Three languages were recorded in India: Hindi by first language speakers, Hindi by second language speakers and Indian English. Furthermore, Mandarin was recorded in China and Korean in South-Korea. The databases are part of the SpeechDat-family and follow the SpeechDat rules in many respects. All databases have been finished and have passed the validation tests. Both Hindi databases and the Korean database will be available to the public for sale.

pdf bib
Corpus and Voices for Catalan Speech Synthesis
Antonio Bonafonte | Jordi Adell | Ignasi Esquerra | Silvia Gallego | Asunción Moreno | Javier Pérez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe the design and production of Catalan database for building synthetic voices. Two speakers, with 10 hours per speaker, have recorded 10 hours of speech. The speaker selection and the corpus design aim to provide resources for high quality synthesis. The resources have been used to build voices for the Festival TTS. Both the original recordings and the Festival databases are freely available for research and for commertial use.

2006

pdf bib
TC-STAR: New language resources for ASR and SLT purposes
Henk van den Heuvel | Khalid Choukri | Christian Gollan | Asuncion Moreno | Djamel Mostefa
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In TC-STAR a variety of Language Resources (LR) is being produced. In this contribution we address the resources that have been created for Automatic Speech Recrognition and Spoken Language Translation. As yet, these are 14 LR in total: two training SLR for ASR (English and Spanish), three development LR and three evaluation LR for ASR (English, Spanish, Mandarin), and three development LR and three evaluation LR for SLT (English-Spanish, Spanish-English, Mandarin-English). In this paper we describe the properties, validation, and availability of these resources.

pdf bib
Spanish Synthesis Corpora
Martí Umbert | Asunción Moreno | Pablo Agüero | Antonio Bonafonte
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper deals with the design of a synthesis database for a high quality corpus-based Speech Synthesis system in Spanish. The database has been designed for speech synthesis, speech conversion and expressive speech. The design follows the specifications of TC-STAR project and has been applied to collect equivalent English and Mandarin synthesis databases. The sentences of the corpus have been selected mainly from transcribed speech and novels. The selection criterion is a phonetic and prosodic coverage. The corpus was completed with sentences specifically designed to cover frequent phrases and words. Two baseline speakers and four bilingual speakers were recorded. Recordings consist of 10 hours of speech for each baseline speaker and one hour of speech for each voice conversion bilingual speaker. The database is labelled and segmented. Pitch marks and phonetic segmentation was done automatically and up to 50% manually supervised. The database will be available at ELRA.

pdf bib
Generation of Language Resources for the Development of Speech Technologies in Catalan
A. Moreno | Albert Febrer | Lluis Márquez
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes a joint initiative of the Catalan and Spanish Government to produce Language Resources for the Catalan language. A similar methodology to the Basic Language Resource Kit (BLARK) concept was applied to determine the priorities on the production of the Language Resources. The paper shows the LR and tools currently available for the Catalan Language both for Language and Speech technologies. The production of large databases for Automatic Speech Recognition purposes already started. All the resources generated in the project follow EU standards, will be validated by an external centre and will be free and public available through ELRA.

pdf bib
TC-STAR:Specifications of Language Resources and Evaluation for Speech Synthesis
A. Bonafonte | H. Höge | I. Kiss | A. Moreno | U. Ziegenhain | H. van den Heuvel | H.-U. Hain | X. S. Wang | M. N. Garcia
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the framework of the EU funded project TC-STAR (Technology and Corpora for Speech to Speech Translation),research on TTS aims on providing a synthesized voice sounding like the source speaker speaking the target language. To progress in this direction, research is focused on naturalness, intelligibility, expressivity and voice conversion both, in the TC-STAR framework. For this purpose, specifications on large, high quality TTS databases have been developed and the data have been recorded for UK English, Spanish and Mandarin. The development of speech technology in TC-STAR is evaluation driven. Assessment of speech synthesis is needed to determine how well a system or technique performs in comparison to previous versions as well as other approaches (systems & methods). Apart from testing the whole system, all components of the system will be evaluated separately. This approach grants better assesment of each component as well as identification of the best techniques in the different speech synthesisprocesses.This paper describes the specifications of Language Resources for speech synthesis and the specifications for evaluation of speech synthesis activities.

2004

pdf bib
SALA II Across the Finish Line: A Large Collection of Mobile Telephone Speech Databases from North and Latin America completed
Henk van den Heuvel | Phil Hall | Harald Höge | Asunción Moreno | Antonio Rincon | Francesco Senia
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The SALA II project comprises mobile telephone recordings according to the SpeechDat (II) paradigm for several languages in North and Latin America. Each database contains the recordings of 1000 speakers, with the exception of US Spanish (2000 speakers) and US English (4000 speakers). A quarter of the recordings of each database are made respectively in a quiet environment (home/office), in the street, in a public place, and in a moving vehicle. This paper presents an evaluation of the project. The paper details on experiences with respect to the implementation of design specifications, speaker recruitment, data recordings (on site), data processing, orthographic transcription and lexicon generation. Furthermore, the validation procedure and its results are documented. Finally, the availability and distribution of the databases are addressed.

pdf bib
Collection of SLR in the Asian-Pacific Area
Asunción Moreno | Khalid Choukri | Phil Hall | Henk van den Heuvel | Eric Sanders | Francesco Senia | Herbert Tropf
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The goal of this project (LILA) is the collection of a large number of spoken databases for training Automatic Speech Recognition Systems for telephone applications in the Asian Pacific area. Specifications follow those of SpeechDat-like databases. Utterances will be recorded directly from calls made either from fixed or cellular telephones and are composed by read text and answers to specific questions. The project is driven by a consortium composed by a large number of industrial companies. Each company is in charge of the production of two databases. The consortium shares the databases produced in the project. The goal of the project should be reached within the year 2005.

pdf bib
Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
Hanne Fersøe | Elviira Hartikainen | Henk van den Heuvel | Giulio Maltese | Asuncíon Moreno | Shaunie Shammass | Ute Ziegenhain
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper presents specifications and requirements for creation and validation of large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems. The prepared language resources are created and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during years 2002-2005. Large lexica consisting of phonetic, suprasegmental and morpho-syntactic content will be provided with well-documented specifications for 13 languages. A short summary of the LC-STAR project itself is presented. Overview about the specification for the corpora collection and word extraction as well as the specification and format of the lexica are presented. Particular attention is paid to the validation of the produced lexica and the lessons learnt during pre-validation. The created and validated language resources will be available via ELRA/ELDA.

pdf bib
OrienTel - Telephony Databases Across Northern Africa and the Middle East
Dorota Iskra | Rainer Siemund | Jamal Borno | Asuncion Moreno | Ossama Emam | Khalid Choukri | Oren Gedge | Herbert Tropf | Albino Nogueiras | Imed Zitouni | Anastasios Tsopanoglou | Nikos Fakotakis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
OrienTel - Multilingual access to interactive communication services for the Mediterranean and the Middle East
Rainer Siemund | Barbara Heuft | Khalid Choukri | Ossama Emam | Emmanuel Maragoudakis | Herbert Tropf | Oren Gedge | Sherrie Shammass | Asuncion Moreno | Albino Nogueiras Rodriguez | Imed Zitouni | Dorota Iskra
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Interface Databases: Design and Collection of a Multilingual Emotional Speech Database
Vladimir Hozjan | Zdravko Kacic | Asunción Moreno | Antonio Bonafonte | Albino Nogueiras
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
SpeechDat across all America: SALA II
Asunción Moreno | Oren Gedge | Henk van den Heuvel | Harald Höge | Sabine Horbach | Patricia Martin | Elisabeth Pinto | Antonio Rincón | Franco Senia | Rafid Sukkar
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Multidialectal Spanish Modeling for ASR
Mónica Caballero | José B. Mariño | Asunción Moreno
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
SALA: SpeechDat across Latin America. Results of the First Phase
Asunción Moreno | Robrecht Comeyne | Keith Haslam | Henk van den Heuvel | Harald Höge | Sabine Horbach | Giorgio Micca
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
SpeechDat-Car Fixed Platform
José A.R. Fonollosa | Asunción Moreno
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
NaniTrans: a Speech Labelling Tool
David Portabella | Albert Febrer | Asunción Moreno
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
SPEECHDAT-CAR. A Large Speech Database for Automotive Environments
Asunción Moreno | Børge Lindberg | Christoph Draxler | Gaël Richard | Khalid Choukri | Stephan Euler | Jeffrey Allen
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Search
Co-authors
Venues