2020
pdf
bib
abs
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software
Roland Kuhn
|
Fineen Davis
|
Alain Désilets
|
Eric Joanis
|
Anna Kazantseva
|
Rebecca Knowles
|
Patrick Littell
|
Delaney Lothian
|
Aidan Pine
|
Caroline Running Wolf
|
Eddie Santos
|
Darlene Stewart
|
Gilles Boulianne
|
Vishwa Gupta
|
Brian Maracle Owennatékha
|
Akwiratékha’ Martin
|
Christopher Cox
|
Marie-Odile Junker
|
Olivia Sammons
|
Delasie Torkornoo
|
Nathan Thanyehténhas Brinklow
|
Sara Child
|
Benoît Farley
|
David Huggins-Daines
|
Daisy Rosenblum
|
Heather Souter
Proceedings of the 28th International Conference on Computational Linguistics
This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen’kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects.
2010
bib
abs
WeBiText: Multilingual Concordancer Built from Public High Quality Web Content
Alain Désilets
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program
In this paper, we describe WeBiText (www.webitext.ca) and how it is being used. WeBiText is a concordancer that allows translators to search in large, high-quality multilingual web sites, in order to find solutions to translation problems. After a quick overview of the system, we present results from an analysis of its logs, which provides a picture of how the tool is being used and how well it performs. We show that it is mostly used to find solutions for short, two or three word translation problems. The system produces at least one hit for 58% of the queries, and hits from at least five different web pages in 41% of cases. We show that 36% of the queries correspond to specialized language problems, which is much higher than what was previously reported for a similar concordancer based on the Canadian Hansard (TransSearch). We also provide a back of the envelope calculation of the current economic impact of the tool, which we estimate at $1 million per year, and growing rapidly.
2009
pdf
bib
Building a collaborative multilingual terminology system
Alain Désilets
|
Louis-Philippe Huberdeau
|
Marc Laporte
|
Jean Quirion
Proceedings of Translating and the Computer 31
pdf
bib
Using First and Second Language Models to Correct Preposition Errors in Second Language Authoring
Matthieu Hermet
|
Alain Désilets
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
bib
Using Automatic Roundtrip Translation to Repair General Errors in Second Language Writing
Alain Désilets
|
Matthieu Hermet
Proceedings of Machine Translation Summit XII: Posters
pdf
bib
Up close and personal with a Translator – How Translators Really Work
Alain Desilets
|
UQO
|
Geneviève Patenaude
Proceedings of Machine Translation Summit XII: Tutorials
pdf
bib
How Translators Use Tools and Resources to Resolve Translation Problems: an Ethnographic Study
Alain Désilets
|
Christiane Melançon
|
Geneviève Patenaude
|
Louise Brunette
Beyond Translation Memories: New Tools for Translators Workshop
2008
pdf
bib
abs
Using the Web as a Linguistic Resource to Automatically Correct Lexico-Syntactic Errors
Matthieu Hermet
|
Alain Désilets
|
Stan Szpakowicz
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper presents an algorithm for correcting language errors typical of second-language learners. We focus on preposition errors, which are very common among second-language learners but are not addressed well by current commercial grammar correctors and editing aids. The algorithm takes as input a sentence containing a preposition error (and possibly other errors as well), and outputs the correct preposition for that particular sentence context. We use a two-phase hybrid rule-based and statistical approach. In the first phase, rule-based processing is used to generate a short expression that captures the context of use of the preposition in the input sentence. In the second phase, Web searches are used to evaluate the frequency of this expression, when alternative prepositions are used instead of the original one. We tested this algorithm on a corpus of 133 French sentences written by intermediate second-language learners, and found that it could address 69.9% of those cases. In contrast, we found that the best French grammar and spell checker currently on the market, Antidote, addressed only 3% of those cases. We also showed that performance degrades gracefully when using a corpus of frequent n-grams to evaluate frequencies.
pdf
bib
abs
Evaluating productivity gains of hybrid ASR-MT systems for translation dictation.
Alain Désilets
|
Marta Stojanovic
|
Jean-François Lapointe
|
Rick Rose
|
Aarthi Reddy
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers
This paper is about Translation Dictation with ASR, that is, the use of Automatic Speech Recognition (ASR) by human translators, in order to dictate translations. We are particularly interested in the productivity gains that this could provide over conventional keyboard input, and ways in which such gains might be increased through a combination of ASR and Statistical Machine Translation (SMT). In this hybrid technology, the source language text is presented to both the human translator and a SMT system. The latter produces N-best translations hypotheses, which are then used to fine tune the ASR language model and vocabulary towards utterances which are probable translations of source text sentences. We conducted an ergonomic experiment with eight professional translators dictating into French, using a top of the line off-the-shelf ASR system (Dragon NatuallySpeaking 8). We found that the ASR system had an average Word Error Rate (WER) of 11.7 percent, and that translation using this system did not provide statistically significant productivity increases over keyboard input, when following the manufacturer recommended procedure for error correction. However, we found indications that, even in its current imperfect state, French ASR might be beneficial to translators who are already used to dictation (either with ASR or a dictaphone), but more focused experiments are needed to confirm this. We also found that dictation using an ASR with WER of 4 percent or less would have resulted in statistically significant (p less than 0.6) productivity gains in the order of 25.1 percent to 44.9 percent Translated Words Per Minute. We also evaluated the extent to which the limited manufacturer provided Domain Adaptation features could be used to positively bias the ASR using SMT hypotheses. We found that the relative gains in WER were much lower than has been reported in the literature for tighter integration of SMT with ASR, pointing the advantages of tight integration approaches and the need for more research in that area.
pdf
bib
abs
Reliable Innovation: A Tecchie’s Travels in the Land of Translators
Alain Désilets
|
Louise Brunette
|
Christiane Melançon
|
Geneviève Patenaude
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT
pdf
bib
WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content
Alain Désilets
Proceedings of Translating and the Computer 30
2007
pdf
bib
Translation Wikified: how will massive online collaboration impact the world of translation?
Alain Désilets
Proceedings of Translating and the Computer 29
2005
pdf
bib
Semantic Similarity for Detecting Recognition Errors in Automatic Speech Transcripts
Diana Inkpen
|
Alain Désilets
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing