Manny Rayner

Also published as: M. Rayner

2024

pdf bib abs
T is for Treu, but how do you pronounce that? Using C-LARA to create phonetic texts for Kanak languages
Pauline Welby | Fabrice Wacalie | Manny Rayner | Chatgpt-4 C-Lara-Instance
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages

In Drehu, a language of the indigenous Kanak people of New Caledonia, the word treu ‘moon’ is pronounced [{tSe.u}]; but, even if they hear the word, the spelling pulls French speakers to a spurious pronunciation [tK{o}]. We implement a strategy to mitigate the influence of such orthographic conflicts, while retaining the benefits of written input on vocabulary learning. We present text in “phonetized” form, where words are broken down into components associated with mnemonically presented phonetic values, adapting features from the “Comment ça se prononce~?” multilingual phonetizer. We present an exploratory project where we used the ChatGPT-based Learning And Reading Assistant (C-LARA) to implement a version of the phonetizer strategy, outlining how the AI-engineered codebase and help from the AI made it easy to add the necessary extensions. We describe two proof-of-concept texts for learners produced using the platform, a Drehu alphabet book and a Drehu version of “The (North) Wind and the Sun”; both texts include native-speaker recorded audio, pronunciation respellings based on French orthography, and AI-generated illustrations.

2023

pdf bib
Using LARA to rescue a legacy Pitjantjatjara course
Manny Rayner | Sasha Wilmoth
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages

We present a cross-linguistic study in which the open source C-LARA platform was used to evaluate GPT-4’s ability to perform several key tasks relevant to Computer Assisted Language Learning. For each of the languages English, Farsi, Faroese, Mandarin and Russian, we instructed GPT-4, through C-LARA, to write six different texts, using prompts chosen to obtain texts of widely differing character. We then further instructed GPT-4 to annotate each text with segmentation markup, glosses and lemma/part-of-speech information; native speakers hand-corrected the texts and annotations to obtain error rates on the different component tasks. The C-LARA platform makes it easy to combine the results into a single multimodal document, further facilitating checking of their correctness. GPT-4’s performance varied widely across languages and processing tasks, but performance on different text genres was roughly comparable. In some cases, most notably glossing of English text, we found that GPT-4 was consistently able to revise its annotations to improve them.

2022

A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s “Le petit prince”, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2x2 cross product of the conditions dialogue, not-dialogue and humour, not-humour. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.

We describe recent extensions to the open source Learning And Reading Assistant (LARA) supporting image-based and phonetically annotated texts. We motivate the utility of these extensions both in general and specifically in relation to endangered and archaic languages, and illustrate with examples from the revived Australian language Barngarla, Icelandic Sign Language, Irish Gaelic, Old Norse manuscripts and Egyptian hieroglyphics.

pdf bib
Using public domain resources and off-the-shelf tools to produce high-quality multimedia texts
Manny Rayner | Belinda Chiera | Cathy Chua
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

We present an overview of LARA, the Learning And Reading Assistant, an open source platform for easy creation and use of multimedia annotated texts designed to support the improvement of reading skills. The paper is divided into three parts. In the first, we give a brief summary of LARA’s processing. In the second, we describe some generic functionality specially relevant for reading assistance: support for phonetically annotated texts, support for image-based texts, and integrated production of text-to-speech (TTS) generated audio. In the third, we outline some of the larger projects so far carried out with LARA, involving development of content for learning second and foreign (L2) languages such as Icelandic, Farsi, Irish, Old Norse and the Australian Aboriginal language Barngarla, where the issues involved overlap with those that arise when trying to help students improve first-language (L1) reading skills. All software and almost all content is freely available.

2021

pdf bib
LARA in the Service of Revivalistics and Documentary Linguistics: Community Engagement and Endangered Languages
Ghil’Ad Zuckermann | Sigurður Vigfússon | Manny Rayner | Neasa Ní Chiaráin | Nedelina Ivanova | Hanieh Habibi | Branislav Bédi
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

2020

LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners. This involves semi-automatically tagging the text, adding other annotations and recording audio. The platform is suitable for creating texts in multiple languages via crowdsourcing techniques that can be used for teaching a language via reading and listening. We present results of initial experiments by various collaborators where we measure the time required to produce substantial LARA resources, up to the length of short novels, in Dutch, English, Farsi, French, German, Icelandic, Irish, Swedish and Turkish. The first results are encouraging. Although there are some startup problems, the conversion task seems manageable for the languages tested so far. The resulting enriched texts are posted online and are freely available in both source and compiled form.

2016

pdf bib
An Open Web Platform for Rule-Based Speech-to-Sign Translation
Manny Rayner | Pierrette Bouillon | Sarah Ebling | Johanna Gerlach | Irene Strasly | Nikos Tsourakis
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib abs
A Shared Task for Spoken CALL?
Claudia Baur | Johanna Gerlach | Manny Rayner | Martin Russell | Helmer Strik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We argue that the field of spoken CALL needs a shared task in order to facilitate comparisons between different groups and methodologies, and describe a concrete example of such a task, based on data collected from a speech-enabled online tool which has been used to help young Swiss German teens practise skills in English conversation. Items are prompt-response pairs, where the prompt is a piece of German text and the response is a recorded English audio file. The task is to label pairs as “accept” or “reject”, accepting responses which are grammatically and linguistically correct to match a set of hidden gold standard answers as closely as possible. Initial resources are provided so that a scratch system can be constructed with a minimal investment of effort, and in particular without necessarily using a speech recogniser. Training data for the task will be released in June 2016, and test data in January 2017.

2014

pdf bib abs
Using a Serious Game to Collect a Child Learner Speech Corpus
Claudia Baur | Manny Rayner | Nikos Tsourakis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an English-L2 child learner speech corpus, produced by 14 year old Swiss German-L1 students in their third year of learning English, which is currently in the process of being collected. The collection method uses a web-enabled multimodal language game implemented using the CALL-SLT platform, in which subjects hold prompted conversations with an animated agent. Prompts consist of a short animated Engligh-language video clip together with a German-language piece of text indicating the semantic content of the requested response. Grammar-based speech understanding is used to decide whether responses are accepted or rejected, and dialogue flow is controlled using a simple XML-based scripting language; the scripts are written to allow multiple dialogue paths, the choice being made randomly. The system is gamified using a score-and-badge framework with four levels of badges. We describe the application, the data collection and annotation procedures, and the initial tranche of data. The full corpus, when complete, should contain at least 5,000 annotated utterances.

pdf bib
A tool for building multilingual voice questionnaires
Alejandro Armando | Pierrette Bouillon | Manny Rayner | Nikos Tsourakis
Proceedings of Translating and the Computer 36

pdf bib abs
CALL-SLT: A Spoken CALL System Based on Grammar and Speech Recognition
Manny Rayner | Nikos Isourakis | Claudia Baur | Pierrette Bouillon | Johannna Gerlach
Linguistic Issues in Language Technology, Volume 10, 2014

We describe CALL-SLT, a speech-enabled Computer-Assisted Language Learning application where the central idea is to prompt the student with an abstract representation of what they are supposed to say, and then use a combination of grammar-based speech recognition and rule-based translation to rate their response. The system has been developed to the level of a mature prototype, freely deployed on the web, with versions for several languages. We present an overview of the core system architecture and the various types of content we have developed. Finally, we describe several evaluations, the last of which is a study carried out over about a week using 130 subjects recruited through the Amazon Mechanical Turk, in which CALL-SLT was contrasted against a control version where the speech recognition component was disabled. The improvement in student learning performance between the two groups was significant at p < 0.02.

2013

pdf bib
Two Approaches to Correcting Homophone Confusions in a Hybrid Machine Translation System
Pierrette Bouillon | Johanna Gerlach | Ulrich Germann | Barry Haddow | Manny Rayner
Proceedings of the Second Workshop on Hybrid Approaches to Translation

2012

pdf bib abs
Using Source-Language Transformations to Address Register Mismatches in SMT
Manny Rayner | Pierrette Bouillon | Barry Haddow
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

Mismatches between training and test data are a ubiquitous problem for real SMT applications. In this paper, we examine a type of mismatch that commonly arises when translating from French and similar languages: available training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface transformations that map common informal language constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to create artificial training data or to pre-process source text at run-time. An initial evaluation performed using crowd-sourced comparisons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effective of the two.

pdf bib abs
Evaluating Appropriateness Of System Responses In A Spoken CALL Game
Manny Rayner | Pierrette Bouillon | Johanna Gerlach
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe an experiment carried out using a French version of CALL-SLT, a web-enabled CALL game in which students at each turn are prompted to give a semi-free spoken response which the system then either accepts or rejects. The central question we investigate is whether the response is appropriate; we do this by extracting pairs of utterances where both members of the pair are responses by the same student to the same prompt, and where one response is accepted and one rejected. When the two spoken responses are presented in random order, native speakers show a reasonable degree of agreement in judging that the accepted utterance is better than the rejected one. We discuss the significance of the results and also present a small study supporting the claim that native speakers are nearly always recognised by the system, while non-native speakers are rejected a significant proportion of the time.

pdf bib abs
A Scalable Architecture For Web Deployment of Spoken Dialogue Systems
Matthew Fuchs | Nikos Tsourakis | Manny Rayner
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe a scalable architecture, particularly well-suited to cloud-based computing, which can be used for Web-deployment of spoken dialogue systems. In common with similar platforms, like WAMI and the Nuance Mobile Developer Platform, we use a client/server approach in which speech recognition is carried out on the server side; our architecture, however, differs from these systems in offering considerably more elaborate server-side functionality, based on large-scale grammar-based language processing and generic dialogue management. We describe two substantial applications, built using our framework, which we argue would have been hard to construct in WAMI or NMDP. Finally, we present a series of evaluations carried out using CALL-SLT, a speech translation game, where we contrast performance in Web and desktop versions. Task Error Rate in the Web version is only slightly inferior that in the desktop one, and the average additional latency is under half a second. The software is generally available for research purposes.

pdf bib abs
A Corpus for a Gesture-Controlled Mobile Spoken Dialogue System
Nikos Tsourakis | Manny Rayner
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Speech and hand gestures offer the most natural modalities for everyday human-to-human interaction. The availability of diverse spoken dialogue applications and the proliferation of accelerometers on consumer electronics allow the introduction of new interaction paradigms based on speech and gestures. Little attention has been paid however to the manipulation of spoken dialogue systems through gestures. Situation-induced disabilities or real disabilities are determinant factors that motivate this type of interaction. In this paper we propose six concise and intuitively meaningful gestures that can be used to trigger the commands in any SDS. Using different machine learning techniques we achieve a classification error for the gesture patterns of less than 5%, and we also compare our own set of gestures to ones proposed by users. Finally, we examine the social acceptability of the specific interaction scheme and encounter high levels of acceptance for public use.

2011

pdf bib
Pour une interlangue utile en traduction automatique de la parole dans des domaines limités [Towards an interlingua for speech translation in limited domains]
Pierrette Bouillon | Manny Rayner | Paula Estella | Johanna Gerlach | Maria Georgescul
Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]

pdf bib
Bootstrapping a statistical speech translator from a rule-based one
Manny Rayner | Paula Estrella | Pierrette Bouillon
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation

2010

pdf bib
A Bootstrapped Interlingua-Based SMT Architecture
Manny Rayner | Paula Estrella | Pierrette Bouillon
Proceedings of the 14th Annual Conference of the European Association for Machine Translation

pdf bib abs
Examining the Effects of Rephrasing User Input on Two Mobile Spoken Language Systems
Nikos Tsourakis | Agnes Lisowska | Manny Rayner | Pierrette Bouillon
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

During the construction of a spoken dialogue system much effort is spent on improving the quality of speech recognition as possible. However, even if an application perfectly recognizes the input, its understanding may be far from what the user originally meant. The user should be informed about what the system actually understood so that an error will not have a negative impact in the later stages of the dialogue. One important aspect that this work tries to address is the effect of presenting the systems understanding during interaction with users. We argue that for specific kinds of applications its important to confirm the understanding of the system before obtaining the output. In this way the user can avoid misconceptions and problems occurring in the dialogue flow and he can enhance his confidence in the system. Nevertheless this has an impact on the interaction, as the mental workload increases, and the users behavior may adapt to the systems coverage. We focus on two applications that implement the notion of rephrasing users input in a different way. Our study took place among 14 subjects that used both systems on a Nokia N810 Internet Tablet.

We describe a multilingual Open Source CALL game, CALL-SLT, which reuses speech translation technology developed using the Regulus platform to create an automatic conversation partner that allows intermediate-level language students to improve their fluency. We contrast CALL-SLT with Wang's and Seneff's ``translation game'' system, in particular focussing on three issues. First, we argue that the grammar-based recognition architecture offered by Regulus is more suitable for this type of application; second, that it is preferable to prompt the student in a language-neutral form, rather than in the L1; and third, that we can profitably record successful interactions by native speakers and store them to be reused as online help for students. The current system, which will be demoed at the conference, supports four L2s (English, French, Japanese and Swedish) and two L1s (English and French). We conclude by describing an evaluation exercise, where a version of CALL-SLT configured for English L2 and French L1 was used by several hundred high school students. About half of the subjects reported positive impressions of the system.

2009

pdf bib
Using Artificial Data to Compare the Difficulty of Using Statistical Machine Translation in Different Language-Pairs
Manny Rayner | Paula Estrella | Pierrette Bouillon | Yukie Nakao
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Using Paraphrases of Deep Semantic Representions to Support Regression Testing in Spoken Dialogue Systems
Beth Ann Hockey | Manny Rayner
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

pdf bib
Using Artificially Generated Data to Evaluate Statistical Machine Translation
Manny Rayner | Paula Estrella | Pierrette Bouillon | Beth Ann Hockey | Yukie Nakao
Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks (GEAF 2009)

2008

pdf bib
Comparing two different bidirectional versions of the limited-domain medical spoken language translator MedSLT
Marianne Starlander | Pierrette Bouillon | Glenn Flores | Manny Rayner | Nikos Tsourakis
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

pdf bib
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Michael McTear | Manny Rayner
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

pdf bib
Making Speech Look Like Text in the Regulus Development Environment
Elisabeth Kron | Manny Rayner | Marianne Santaholma | Pierrette Bouillon | Agnes Lisowska
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

We describe recent work on MedSLT, a medium-vocabulary interlingua-based medical speech translation system, focussing on issues that arise when handling languages of which the grammar engineer has little or no knowledge. We show how we can systematically create and maintain multiple forms of grammars, lexica and interlingual representations, with some versions being used by language informants, and some by grammar engineers. In particular, we describe the advantages of structuring the interlingua definition as a simple semantic grammar, which includes a human-readable surface form. We show how this allows us to rationalise the process of evaluating translations between languages lacking common speakers, and also makes it possible to create a simple generic tool for debugging to-interlingua translation rules. Examples presented focus on the concrete case of translation between Japanese and Arabic in both directions.

pdf bib abs
Building Mobile Spoken Dialogue Applications Using Regulus
Nikos Tsourakis | Maria Georgescul | Pierrette Bouillon | Manny Rayner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Regulus is an Open Source platform that supports construction of rule-based medium-vocabulary spoken dialogue applications. It has already been used to build several substantial speech-enabled applications, including NASAs Clarissa procedure navigator and Geneva Universitys MedSLT medical speech translator. System like these would be far more useful if they were available on a hand-held device, rather than, as with the present version, on a laptop. In this paper we describe the Open Source framework we have developed, which makes it possible to run Regulus applications on generally available mobile devices, using a distributed client-server architecture that offers transparent and reliable integration with different types of ASR systems. We describe the architecture, an implemented calendar application prototype hosted on a mobile device, and an evaluation. The evaluation shows that performance on the mobile device is as good as performance on a normal desktop PC.

pdf bib abs
Many-to-Many Multilingual Medical Speech Translation on a PDA
Kyoko Kanzaki | Yukie Nakao | Manny Rayner | Marianne Santaholma | Marianne Starlander | Nikos Tsourakis
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

Particularly considering the requirement of high reliability, we argue that the most appropriate architecture for a medical speech translator that can be realised using today’s technology combines unidirectional (doctor to patient) translation, medium-vocabulary controlled language coverage, interlingua-based translation, an embedded help component, and deployability on a hand-held hardware platform. We present an overview of the Open Source MedSLT prototype, which has been developed in accordance with these design principles. The system is implemented on top of the Regulus and Nuance 8.5 platforms, translates patient examination questions for all language pairs in the set {English, French, Japanese, Arabic, Catalan}, using vocabularies of about 400 to 1 100 words, and can be run in a distributed client/server environment, where the client application is hosted on a Nokia Internet Tablet device.

pdf bib
Almost Flat Functional Semantics for Speech Translation
Manny Rayner | Pierrette Bouillon | Beth Ann Hockey | Yukie Nakao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib abs
Les ellipses dans un système de traduction automatique de la parole
Pierrette Bouillon | Manny Rayner | Marianne Starlander | Marianne Santaholma
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Dans tout dialogue, les phrases elliptiques sont très nombreuses. Dans cet article, nous évaluons leur impact sur la reconnaissance et la traduction dans le système de traduction automatique de la parole MedSLT. La résolution des ellipses y est effectuée par une méthode robuste et portable, empruntée aux systèmes de dialogue homme-machine. Cette dernière exploite une représentation sémantique plate et combine des techniques linguistiques (pour construire la représentation) et basées sur les exemples (pour apprendre sur la base d’un corpus ce qu’est une ellipse bien formée dans un sous-domaine donné et comment la résoudre).

pdf bib
Adapting a Medical speech to speech translation system (MedSLT) to Arabic
Pierrette Bouillon | Sonia Halimi | Manny Rayner | Beth Ann Hockey
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

pdf bib
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing
Pierrette Bouillon | Manny Rayner
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing

pdf bib
A Development Environment for Building Grammar-Based Speech-Enabled Applications
Elisabeth Kron | Manny Rayner | Marianne Santaholma | Pierrette Bouillon
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing

2006

pdf bib
Proceedings of the First International Workshop on Medical Speech Translation
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Manny Rayner
Proceedings of the First International Workshop on Medical Speech Translation

pdf bib
Evaluating Task Performance for a Unidirectional Controlled Language Medical Speech Translation System
Nikos Chatzichrisafis | Pierrette Bouillon | Manny Rayner | Marianne Santaholma | Marianne Starlander | Beth Ann Hockey
Proceedings of the First International Workshop on Medical Speech Translation

pdf bib
Une grammaire partagée multitâche pour le traitement de la parole : application aux langues romanes [A multitask shared grammar for speech processing: application to romance languages]
Pierrette Bouillon | Manny Rayner | Bruna Novellas | Marianne Starlander | Marianne Santaholma | Yukie Nakao | Nikos Chatzichrisafis
Traitement Automatique des Langues, Volume 47, Numéro 3 : Varia [Varia]

pdf bib abs
Une grammaire multilingue partagée pour la traduction automatique de la parole
Pierrette Bouillon | Manny Rayner | Bruna Novellas | Yukie Nakao | Marianne Santaholma | Marianne Starlander | Nikos Chatzichrisafis
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Aujourd’hui, l’approche la plus courante en traitement de la parole consiste à combiner un reconnaisseur statistique avec un analyseur robuste. Pour beaucoup d’applications cependant, les reconnaisseurs linguistiques basés sur les grammaires offrent de nombreux avantages. Dans cet article, nous présentons une méthodologie et un ensemble de logiciels libres (appelé Regulus) pour dériver rapidement des reconnaisseurs linguistiquement motivés à partir d’une grammaire générale partagée pour le catalan et le français.

pdf bib abs
REGULUS: A Generic Multilingual Open Source Platform for Grammar-Based Speech Applications
Manny Rayner | Pierrette Bouillon | Beth Ann Hockey | Nikos Chatzichrisafis
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present an overview of Regulus, an Open Source platform that supports corpus-based derivation of efficient domain-specific speech recognisers from general linguistically motivated unification grammars. We list available Open Source resources, which include compilers, resource grammars for various languages, documentation and a development environment. The greater part of the paper presents a series of experiments carried out using a medium-vocabulary medical speech translation application and a corpus of 801 recorded domain utterances, designed to investigate the impact on speech understanding performance of vocabulary size, grammatical coverage, presence or absence of various linguistic features, degree of generality of thegrammar and use or otherwise of probabilistic weighting in the CFGlanguage model. In terms of task accuracy, the most significant factors were the use of probabilistic weighting, the degree of generality of the grammar and the inclusion of features which model sortal restrictions.

2005

pdf bib
A Voice Enabled Procedure Browser for the International Space Station
Manny Rayner | Beth A. Hockey | Nikos Chatzichrisafis | Kim Farrell | Jean-Michel Renders
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib abs
Representational and architectural issues in a limited-domain medical speech translator
Manny Rayner | Pierrette Bouillon | Marianne Santaholma | Yukie Nakao
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

We present an overview of MedSLT, a medium-vocabulary medical speech translation system, focussing on the representational issues that arise when translating temporal and causal concepts. Although flat key/value structures are strongly preferred as semantic representations in speech understanding systems, we argue that it is infeasible to handle the necessary range of concepts using only flat structures. By exploiting the specific nature of the task, we show that it is possible to implement a solution which only slightly extends the representational complexity of the semantic representation language, by permitting an optional single nested level representing a subordinate clause construct. We sketch our solutions to the key problems of producing minimally nested representations using phrase-spotting methods, and writing cleanly structured rule-sets that map temporal and phrasal representations into a canonical interlingual form.

In this paper, we present evidence that providing users of a speech to speech translation system for emergency diagnosis (MedSLT) with a tool that helps them to learn the coverage greatly improves their success in using the system. In MedSLT, the system uses a grammar-based recogniser that provides more predictable results to the translation component. The help module aims at addressing the lack of robustness inherent in this type of approach. It takes as input the result of a robust statistical recogniser that performs better for out-of-coverage data and produces a list of in-coverage example sentences. These examples are selected from a defined list using a heuristic that prioritises sentences maximising the number of N-grams shared with those extracted from the recognition result.