Marianne Santaholma

2009

Comparing Speech Recognizers Derived from Mono- and Multilingual Grammars
Marianne Santaholma
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

This paper examines the performance of multilingual parameterized grammar rules on speech recognition. We present a performance comparison of two different types of Japanese and English grammar-based speech recognizers. One system is derived from monolingual grammar rules and the other from multilingual parameterized grammar rules. The latter one uses hence the same grammar rules for creation of the language models for these two different languages. We carried out experiments on speech recognition of limited domain dialog application. These experiments show that the language models derived from multilingual parameterized grammar rules (1) perform equally well on both tested languages, on English and Japanese, and (2) that the performance is comparable with the recognizers derived from monolingual grammars that were explicitly developed for these languages. This suggests that the sharing grammar resources between different languages could be one solution for more efficient development of rule-based speech recognizers.

pdf bib

Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks (GEAF 2009)
Tracy Holloway King | Marianne Santaholma
Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks (GEAF 2009)

2008

pdf bib abs

Many-to-Many Multilingual Medical Speech Translation on a PDA
Kyoko Kanzaki | Yukie Nakao | Manny Rayner | Marianne Santaholma | Marianne Starlander | Nikos Tsourakis
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

Particularly considering the requirement of high reliability, we argue that the most appropriate architecture for a medical speech translator that can be realised using today’s technology combines unidirectional (doctor to patient) translation, medium-vocabulary controlled language coverage, interlingua-based translation, an embedded help component, and deployability on a hand-held hardware platform. We present an overview of the Open Source MedSLT prototype, which has been developed in accordance with these design principles. The system is implemented on top of the Regulus and Nuance 8.5 platforms, translates patient examination questions for all language pairs in the set {English, French, Japanese, Arabic, Catalan}, using vocabularies of about 400 to 1 100 words, and can be run in a distributed client/server environment, where the client application is hosted on a Nokia Internet Tablet device.

pdf bib

Multilingual Grammar Resources in Multilingual Application Development
Marianne Santaholma
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

pdf bib

pdf bib

Making Speech Look Like Text in the Regulus Development Environment
Elisabeth Kron | Manny Rayner | Marianne Santaholma | Pierrette Bouillon | Agnes Lisowska
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

pdf bib abs

A Knowledge-Modeling Approach for Multilingual Regulus Lexica
Marianne Santaholma | Nikos Chatzichrisafis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Development of lexical resources is, along with grammar development, one of the main efforts when building multilingual NLP applications. In this paper, we present a tool-based approach for more efficient manual lexicon development for a spoken language translation system. The approach in particular addresses the common problems of multilingual lexica including the redundancy of encoded information and inconsistency of lexica of different languages. The general benefits of this practical tool-based approach are clear and user-friendly lexicon structure, inheritance of information inside of a language and between different system languages, and transparency and consistency of coverage between system languages. The visual tool-based approach is user-friendly to linguistic informants that dont have previous experience of lexicon development, while at the same time, it still is a powerful tool for expert system developers.

2007

pdf bib

Grammar Sharing Techniques for Rule-based Multilingual NLP Systems
Marianne Santaholma
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib abs

Les ellipses dans un système de traduction automatique de la parole
Pierrette Bouillon | Manny Rayner | Marianne Starlander | Marianne Santaholma
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Dans tout dialogue, les phrases elliptiques sont très nombreuses. Dans cet article, nous évaluons leur impact sur la reconnaissance et la traduction dans le système de traduction automatique de la parole MedSLT. La résolution des ellipses y est effectuée par une méthode robuste et portable, empruntée aux systèmes de dialogue homme-machine. Cette dernière exploite une représentation sémantique plate et combine des techniques linguistiques (pour construire la représentation) et basées sur les exemples (pour apprendre sur la base d’un corpus ce qu’est une ellipse bien formée dans un sous-domaine donné et comment la résoudre).

pdf bib

A Development Environment for Building Grammar-Based Speech-Enabled Applications
Elisabeth Kron | Manny Rayner | Marianne Santaholma | Pierrette Bouillon
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing

pdf bib

2006

pdf bib

pdf bib abs

Une grammaire multilingue partagée pour la traduction automatique de la parole
Pierrette Bouillon | Manny Rayner | Bruna Novellas | Yukie Nakao | Marianne Santaholma | Marianne Starlander | Nikos Chatzichrisafis
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Aujourd’hui, l’approche la plus courante en traitement de la parole consiste à combiner un reconnaisseur statistique avec un analyseur robuste. Pour beaucoup d’applications cependant, les reconnaisseurs linguistiques basés sur les grammaires offrent de nombreux avantages. Dans cet article, nous présentons une méthodologie et un ensemble de logiciels libres (appelé Regulus) pour dériver rapidement des reconnaisseurs linguistiquement motivés à partir d’une grammaire générale partagée pour le catalan et le français.

pdf bib

Une grammaire partagée multitâche pour le traitement de la parole : application aux langues romanes [A multitask shared grammar for speech processing: application to romance languages]
Pierrette Bouillon | Manny Rayner | Bruna Novellas | Marianne Starlander | Marianne Santaholma | Yukie Nakao | Nikos Chatzichrisafis
Traitement Automatique des Langues, Volume 47, Numéro 3 : Varia [Varia]

pdf bib

Evaluating Task Performance for a Unidirectional Controlled Language Medical Speech Translation System
Nikos Chatzichrisafis | Pierrette Bouillon | Manny Rayner | Marianne Santaholma | Marianne Starlander | Beth Ann Hockey
Proceedings of the First International Workshop on Medical Speech Translation

2005

pdf bib abs

Representational and architectural issues in a limited-domain medical speech translator
Manny Rayner | Pierrette Bouillon | Marianne Santaholma | Yukie Nakao
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

We present an overview of MedSLT, a medium-vocabulary medical speech translation system, focussing on the representational issues that arise when translating temporal and causal concepts. Although flat key/value structures are strongly preferred as semantic representations in speech understanding systems, we argue that it is infeasible to handle the necessary range of concepts using only flat structures. By exploiting the specific nature of the task, we show that it is possible to implement a solution which only slightly extends the representational complexity of the semantic representation language, by permitting an optional single nested level representing a subordinate clause construct. We sketch our solutions to the key problems of producing minimally nested representations using phrase-spotting methods, and writing cleanly structured rule-sets that map temporal and phrasal representations into a canonical interlingual form.

pdf bib

pdf bib

Linguistic representation of Finnish in a limited domain speech-to-speech translation system
Marianne Santaholma
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib

pdf bib abs

In this paper, we present evidence that providing users of a speech to speech translation system for emergency diagnosis (MedSLT) with a tool that helps them to learn the coverage greatly improves their success in using the system. In MedSLT, the system uses a grammar-based recogniser that provides more predictable results to the translation component. The help module aims at addressing the lack of robustness inherent in this type of approach. It takes as input the result of a robust statistical recogniser that performs better for out-of-coverage data and produces a list of in-coverage example sentences. These examples are selected from a defined list using a heuristic that prioritises sentences maximising the number of N-grams shared with those extracted from the recognition result.

pdf bib abs

Linguistic representation of Finnish in the medical domain spoken language translation system
Marianne Santaholma
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Source medical domain speech-to-speech translation system. The paper describes the collection of medical Finnish corpora, the creation of a Finnish grammar by adapting the original English grammar, the composition of a domain specific Finnish lexicon and the definition of interlingua to Finnish mapping rules for multilingual translation. It is shown that Finnish can be effectively introduced into the existing MedSLT framework and that despite the differences between English and Finnish, the Finnish grammar can be created by manual adaptation from the original English grammar. Regarding further development, the initial evaluation results of English-Finnish speech-to-speech translation are encouraging.