CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis
Carmen García-Mateo
Antonio Cardenal
Xosé Luis Regueira
Elisa Fernández Rei
Marta Martinez
Roberto Seara
Rocío Varela
Noemí Basanta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the CORILGA (Corpus Oral Informatizado da Lingua Galega). CORILGA is a large high-quality corpus of spoken Galician from the 1960s up to present-day, including both formal and informal spoken language from both standard and non-standard varieties, and across different generations and social levels. The corpus will be available to the research community upon completion. Galician is one of the EU languages that needs further research before highly effective language technology solutions can be implemented. A software repository for speech resources in Galician is also described. The repository includes a structured database, a graphical interface and processing tools. The use of a database enables to perform search in a simple and fast way based in a number of different criteria. The web-based user interface facilitates users the access to the different materials. Last but not least a set of transcription-based modules for automatic speech recognition has been developed, thus facilitating the orthographic labelling of the recordings.
BUCEADOR, a multi-language search engine for digital libraries
Jordi Adell
Antonio Bonafonte
Antonio Cardenal
Marta R. Costa-Jussà
José A. R. Fonollosa
Asunción Moreno
Eva Navas
Eduardo R. Banga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.
Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News
Carmen Garcia-Mateo
Javier Dieguez-Tirado
Laura Docio-Fernandez
Antonio Cardenal-Lopez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)