Steve Renals


2021

pdf bib
European Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.

2020

pdf bib
European Language Grid: An Overview
Georg Rehm | Maria Berger | Ela Elsholz | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Stelios Piperidis | Miltos Deligiannis | Dimitris Galanis | Katerina Gkirtzou | Penny Labropoulou | Kalina Bontcheva | David Jones | Ian Roberts | Jan Hajič | Jana Hamrlová | Lukáš Kačena | Khalid Choukri | Victoria Arranz | Andrejs Vasiļjevs | Orians Anvari | Andis Lagzdiņš | Jūlija Meļņika | Gerhard Backfried | Erinç Dikici | Miroslav Janosik | Katja Prinz | Christoph Prinz | Severin Stampler | Dorothea Thomas-Aniola | José Manuel Gómez-Pérez | Andres Garcia Silva | Christian Berrío | Ulrich Germann | Steve Renals | Ondrej Klejch
Proceedings of the 12th Language Resources and Evaluation Conference

With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented – by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 national competence centres and the European LT Council for outreach and coordination purposes.

2018

pdf bib
Word Error Rate Estimation for Speech Recognition: e-WER
Ahmed Ali | Steve Renals
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER e-WER was 25.3% for the three hours test set, while the actual WER was 28.5%.

2017

pdf bib
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

2016

pdf bib
Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
Guntis Barzdins | Steve Renals | Didzis Gosko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper steps outside the comfort-zone of the traditional NLP tasks like automatic speech recognition (ASR) and machine translation (MT) to addresses two novel problems arising in the automated multilingual news monitoring: segmentation of the TV and radio program ASR transcripts into individual stories, and clustering of the individual stories coming from various sources and languages into storylines. Storyline clustering of stories covering the same events is an essential task for inquisitorial media monitoring. We address these two problems jointly by engaging the low-dimensional semantic representation capabilities of the sequence to sequence neural translation models. To enable joint multi-task learning for multilingual neural translation of morphologically rich languages we replace the attention mechanism with the sliding-window mechanism and operate the sequence to sequence neural translation model on the character-level rather than on the word-level. The story segmentation and storyline clustering problem is tackled by examining the low-dimensional vectors produced as a side-product of the neural translation process. The results of this paper describe a novel approach to the automatic story segmentation and storyline clustering problem.

2015

pdf bib
Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR
Ahmed Ali | Walid Magdy | Steve Renals
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib
The UEDIN ASR systems for the IWSLT 2014 evaluation
Peter Bell | Pawel Swietojanski | Joris Driesen | Mark Sinclair | Fergus McInnes | Steve Renals
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the University of Edinburgh (UEDIN) ASR systems for the 2014 IWSLT Evaluation. Notable features of the English system include deep neural network acoustic models in both tandem and hybrid configuration with the use of multi-level adaptive networks, LHUC adaptation and Maxout units. The German system includes lightly supervised training and a new method for dictionary generation. Our voice activity detection system now uses a semi-Markov model to incorporate a prior on utterance lengths. There are improvements of up to 30% relative WER on the tst2013 English test set.

2013

pdf bib
Description of the UEDIN system for German ASR
Joris Driesen | Peter Bell | Mark Sinclair | Steve Renals
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe the ASR system for German built at the University of Edinburgh (UEDIN) for the 2013 IWSLT evaluation campaign. For ASR, the major challenge to overcome, was to find suitable acoustic training data. Due to the lack of expertly transcribed acoustic speech data for German, acoustic model training had to be performed on publicly available data crawled from the internet. For evaluation, lack of a manual segmentation into utterances was handled in two different ways: by generating an automatic segmentation, and by treating entire input files as a single segment. Demonstrating the latter method is superior in the current task, we obtained a WER of 28.16% on the dev set and 36.21% on the test set.

pdf bib
The UEDIN English ASR system for the IWSLT 2013 evaluation
Peter Bell | Fergus McInnes | Siva Reddy Gangireddy | Mark Sinclair | Alexandra Birch | Steve Renals
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the University of Edinburgh (UEDIN) English ASR system for the IWSLT 2013 Evaluation. Notable features of the system include deep neural network acoustic models in both tandem and hybrid configuration, cross-domain adaptation with multi-level adaptive networks, and the use of a recurrent neural network language model. Improvements to our system since the 2012 evaluation – which include the use of a significantly improved n-gram language model – result in a 19% relative WER reduction on the tst2012 set.

2012

pdf bib
The UEDIN systems for the IWSLT 2012 evaluation
Eva Hasler | Peter Bell | Arnab Ghoshal | Barry Haddow | Philipp Koehn | Fergus McInnes | Steve Renals | Pawel Swietojanski
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the University of Edinburgh (UEDIN) systems for the IWSLT 2012 Evaluation. We participated in the ASR (English), MT (English-French, German-English) and SLT (English-French) tracks.

2010

pdf bib
Invited Talk: Recognition and Understanding of Meetings
Steve Renals
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2007

pdf bib
Automatic Segmentation and Summarization of Meeting Speech
Gabriel Murray | Pei-Yun Hsueh | Simon Tucker | Jonathan Kilgour | Jean Carletta | Johanna D. Moore | Steve Renals
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

2006

pdf bib
Automatic Segmentation of Multiparty Dialogue
Pei-Yun Hsueh | Johanna D. Moore | Steve Renals
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Prosodic Correlates of Rhetorical Relations
Gabriel Murray | Maite Taboada | Steve Renals
Proceedings of the Analyzing Conversations in Text and Speech

pdf bib
Incorporating Speaker and Discourse Features into Speech Summarization
Gabriel Murray | Steve Renals | Jean Carletta | Johanna Moore
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Evaluating Automatic Summaries of Meeting Recordings
Gabriel Murray | Steve Renals | Jean Carletta | Johanna Moore
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

Search
Co-authors
Venues