Michael Carl


2021

pdf bib
Word Alignment Dissimilarity Indicator: Alignment Links as Conceptualizations of a Focused Bilingual Lexicon
Devin Gilbert | Michael Carl
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age

2019

pdf bib
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production
Michael Carl | Silvia Hansen-Schirra
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production

pdf bib
Lexical Representation & Retrieval on Monolingual Interpretative text production
Debasish Sahoo | Michael Carl
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production

2018

pdf bib
Literality and cognitive effort: Japanese and Spanish
Isabel Lacruz | Michael Carl | Masaru Yamada
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Experiments in Non-Coherent Post-editing
Cristina Toledo Báez | Moritz Schaeffer | Michael Carl
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

Market pressure on translation productivity joined with technological innovation is likely to fragment and decontextualise translation jobs even more than is cur-rently the case. Many different translators increasingly work on one document at different places, collaboratively working in the cloud. This paper investigates the effect of decontextualised source texts on behaviour by comparing post-editing of sequentially ordered sentences with shuffled sentences from two different texts. The findings suggest that there is little or no effect of the decontextualised source texts on behaviour.

2016

pdf bib
English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
Michael Carl | Akiko Aizawa | Masaru Yamada
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interaction and for text production. However, not much research has been carried out to investigate in detail the processes and strategies involved in the different modes of text production. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT Translation Process Research-DB (TPR-DB), which is publicly available under a creative commons license. The paper presents the ENJA15 data as part of a large multilingual Chinese, Danish, German, Hindi and Spanish translation process data collection of more than 760 translation sessions. It compares the ENJA15 data with the other language pairs and reviews some of its particularities.

pdf bib
Measuring Cognitive Translation Effort with Activity Units
Moritz Jonas Schaeffer | Michael Carl | Isabel Lacruz | Akiko Aizawa
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2014

pdf bib
CASMACAT: A Computer-assisted Translation Workbench
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Integrating online and active learning in a computer-assisted translation workbench
Vicent Alabau | Jesús González-Rubio | Daniel Ortiz-Martínez | Germán Sanchis-Trilles | Francisco Casacuberta | Mercedes García-Martínez | Bartolomé Mesa-Lao | Dan Cheung Petersen | Barbara Dragsted | Michael Carl
Workshop on interactive and adaptive machine translation

This paper describes a pilot study with a computed-assisted translation workbench aiming at testing the integration of online and active learning features. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. User activity data were collected from five beta testers using key-logging and eye-tracking. User feedback was also collected at the end of the experiments in the form of retrospective think-aloud protocols. We found that OL performs better than ITP, especially in terms of translation speed. In addition, AL provides better translation quality than ITP for the same levels of user effort. We plan to incorporate these features in the final version of the workbench.

pdf bib
Predicting post-editor profiles from the translation process
Karan Singla | David Orrego-Carmona | Ashleigh Rhea Gonzales | Michael Carl | Srinivas Bangalore
Workshop on interactive and adaptive machine translation

The purpose of the current investigation is to predict post-editor profiles based on user behaviour and demographics using machine learning techniques to gain a better understanding of post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main research goals: We create n-gram models based on user activity and part-of-speech sequences to automatically cluster post-editors, and we use discriminative classifier models to characterize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities.

pdf bib
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Ulrich Germann | Michael Carl | Philipp Koehn | Germán Sanchis-Trilles | Francisco Casacuberta | Robin Hill | Sharon O’Brien
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf bib
Measuring the Cognitive Effort of Literal Translation Processes
Moritz Schaeffer | Michael Carl
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf bib
CFT13: A resource for research into the post-editing process
Michael Carl | Mercedes Martínez García | Bartolomé Mesa-Lao
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the most recent dataset that has been added to the CRITT Translation Process Research Database (TPR-DB). Under the name CFT13, this new study contains user activity data (UAD) in the form of key-logging and eye-tracking collected during the second CasMaCat field trial in June 2013. The CFT13 is a publicly available resource featuring a number of simple and compound process and product units suited to investigate human-computer interaction while post-editing machine translation outputs.

pdf bib
Evaluating the effects of interactivity in a post-editing workbench
Nancy Underwood | Bartolomé Mesa-Lao | Mercedes García Martínez | Michael Carl | Vicent Alabau | Jesús González-Rubio | Luis A. Leiva | Germán Sanchis-Trilles | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors’ gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.

pdf bib
CASMACAT: cognitive analysis and statistical methods for advanced computer aided translation
Philipp Koehn | Michael Carl | Francisco Casacuberta | Eva Marcos
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
SEECAT: ASR & Eye-tracking enabled computer-assisted translation
Mercedes García-Martínez | Karan Singla | Aniruddha Tammewar | Bartolomé Mesa-Lao | Ankita Thakur | Anusuya M.A. | Srinivas Bangalore | Michael Carl
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2013

pdf bib
Automatically Predicting Sentence Translation Difficulty
Abhijit Mishra | Pushpak Bhattacharyya | Michael Carl
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
User Evaluation of Advanced Interaction Features for a Computer-Assisted Translation Workbench
Vicente Alabau | Jesus Gonzalez-Rubio | Luis A. Leiva | Daniel Ortiz-Martínez | German Sanchis-Trilles | Francisco Casacuberta | Bartolomé Mesa-Lao | Ragnar Bonk | Michael Carl | Mercedes Garcia-Martinez
Proceedings of Machine Translation Summit XIV: User track

pdf bib
CASMACAT: Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation
Philipp Koehn | Michael Carl | Francisco Casacuberta | Eva Marcos
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
Advanced computer aided translation with a web-based workbench
Vicent Alabau | Ragnar Bonk | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Jesús González | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Oriz | Hervé Saint-Amand | Germán Sanchis | Chara Tsiukala
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

2012

pdf bib
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing
Michael Carl | Pushpak Bhattacharyya | Kamal Kumar Choudhary
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing

pdf bib
A heuristic-based approach for systematic error correction of gaze data for reading
Abhijit Mishra | Michael Carl | Pushpak Bhattacharyya
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing

pdf bib
Translog-II: a Program for Recording User Activity Data for Empirical Reading and Writing Research
Michael Carl
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a novel implementation of Translog-II. Translog-II is a Windows-oriented program to record and study reading and writing processes on a computer. In our research, it is an instrument to acquire objective, digital data of human translation processes. As their predecessors, Translog 2000 and Translog 2006, also Translog-II consists of two main components: Translog-II Supervisor and Translog-II User, which are used to create a project file, to run a text production experiments (a user reads, writes or translates a text) and to replay the session. Translog produces a log files which contains all user activity data of the reading, writing, or translation session, and which can be evaluated by external tools. While there is a large body of translation process research based on Translog, this paper gives an overview of the Translog-II functions and its data visualization options.

pdf bib
The CRITT TPR-DB 1.0: A Database for Empirical Human Translation Process Research
Michael Carl
Workshop on Post-Editing Technology and Practice

This paper introduces a publicly available database of recorded translation sessions for Translation Process Research (TPR). User activity data (UAD) of translators behavior was collected over the past 5 years in several translation studies with Translog 1 , a data acquisition software which logs keystrokes and gaze data during text reception and production. The database compiles this data into a consistent format which can be processed by various visualization and analysis tools.

2010

pdf bib
Correlating Translation Product and Translation Process Data of Professional and Student Translators
Michael Carl | Matthias Buch-Kromann
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
A computational framework for a cognitive model of human translation processes
Michael Carl
Proceedings of Translating and the Computer 32

2009

pdf bib
Grounding Translation Tools in Translator’s Activity Data
Michael Carl
Beyond Translation Memories: New Tools for Translators Workshop

2008

pdf bib
Using Log-linear Models for Tuning Machine Translation Output
Michael Carl
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe a set of experiments to explore statistical techniques for ranking and selecting the best translations in a graph of translation hypotheses. In a previous paper (Carl, 2007) we have described how the graph of hypotheses is generated through shallow transfer and chunk permutation rules, where nodes consist of vectors representing morpho-syntactic properties of words and phrases. This paper describes a number of methods to train statistical feature functions from some of the vector’s components. The feature functions are trained off-line on different types of text and their log-linear combination is then used to retrieve the best translation paths in the graph. We compare two language modelling toolkits, the CMU and the SRI toolkit and arrive at three results: 1) models of lemma-based feature functions produce better results than token-based models, 2) adding PoS-tag feature function to the lemma models improves the output and 3) weights for lexical translations are suited if the training material is similar to the texts to be translated.

pdf bib
Evaluation of a Machine Translation System for Low Resource Languages: METIS-II
Vincent Vandeghinste | Peter Dirix | Ineke Schuurman | Stella Markantonatou | Sokratis Sofianopoulos | Marina Vassiliou | Olga Yannoutsou | Toni Badia | Maite Melero | Gemma Boleda | Michael Carl | Paul Schmidt
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is required, and in which no full parser or extensive rule sets are needed. We describe evalution on a development test set and on a test set coming from Europarl, and compare our results with SYSTRAN. We also provide some further analysis, researching the impact of the number and source of the reference translations and analysing the results according to test text type. The results are expectably lower for the METIS system, but not at an unatainable distance from a mature system like SYSTRAN.

pdf bib
Modelling human translator behaviour with user-activity data
Michael Carl | Arnt Lykke Jakobsen | Kristian T.H. Jensen
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2007

pdf bib
Demonstration of the German to English METIS-II MT system
Michael Carl | Sandrine Garnier | Paul Schmidt
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
METIS-II: the German to English MT system
Michael Carl
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
A Dictionary Lookup Strategy for Translating of Discontinuous Phrases
Michael Carl | Ecaterina Rascu
Proceedings of the 11th Annual conference of the European Association for Machine Translation

pdf bib
METIS-II: Machine Translation for Low Resource Languages
Vincent Vandeghinste | Ineke Schuurman | Michael Carl | Stella Markantonatou | Toni Badia
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we describe a machine translation prototype in which we use only minimal resources for both the source and the target language. A shallow source language analysis, combined with a translation dictionary and a mapping system of source language phenomena into the target language and a target language corpus for generation are all the resources needed in the described system. Several approaches are presented.

2005

pdf bib
Using template-grammars for shake & bake paraphrasing
Michael Carl | Ecaterina Rascu | Paul Schmidt
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts
Michael Carl | Ecaterina Rascu | Johann Haller
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Controlling Gender Equality with Shallow NLP Techniques
Michael Carl | Sandrine Garnier | Johann Haller | Anne Altmayer | Bärbel Miemietz
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Experimenting with phrase-based statistical translation within the IWSLT Chinese-to-English shared translation task
Philippe Langlais | Michael Carl | Oliver Streiter
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

2003

pdf bib
Introduction à la traduction guidée par l’exemple (Traduction par analogie)
Michael Carl
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Tutoriels

Le nombre d’approches en traduction automatique s’est multiplié dans les dernières années. Il existe entre autres la traduction par règles, la traduction statistique et la traduction guidée par l’exemple. Dans cet article je decris les approches principales en traduction automatique. Je distingue les approches qui se basent sur des règles obtenues par l’inspection des approches qui se basent sur des exemples de traduction. La traduction guidée par l’exemple se caractérise par la phrase comme unité de traduction idéale. Une nouvelle traduction est génerée par analogie : seulement les parties qui changent par rapport à un ensemble de traductions connues sont adaptées, modifiées ou substituées. Je présente quelques techniques qui ont été utilisées pour ce faire. Je discuterai un système spécifique, EDGAR, plus en detail. Je démontrerai comment des textes traduits alignés peuvent être preparés en termes de compilation pour extraire des unités de traduction sous-phrastiques. Je présente des résultats en traduction Anglais -> Français produits avec le système EDGAR en les comparant avec ceux d’un système statistique.

pdf bib
Phrase-based Evaluation of Word-to-Word Alignments
Michael Carl | Sisay Fissaha
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Data-assisted controlled translation
Michael Carl
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT

pdf bib
Tuning general translation knowledge to a sublanguage
Michael Carl | Philippe Langlais
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT

2002

pdf bib
An Intelligent Terminology Database as a Pre-processor for Statistical Machine Translation
Michael Carl | Philippe Langlais
COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology

pdf bib
Toward a hybrid integrated translation environment
Michael Carl | Andy Way | Reinhard Schäler
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

In this paper we present a model for the future use of Machine Translation (MT) and Computer Assisted Translation. In order to accommodate the future needs in middle value translations, we discuss a number of MT techniques and architectures. We anticipate a hybrid environment that integrates data- and rule-driven approaches where translations will be routed through the available translation options and consumers will receive accurate information on the quality, pricing and time implications of their translation choice.

2001

pdf bib
Workshop on Example-Based machine Translation
Michael Carl | Andy Way
Workshop on Example-Based machine Translation

pdf bib
Inducing translation grammars from bracketed alignments
Michael Carl
Workshop on Example-Based machine Translation

pdf bib
Inducing probabilistic invertible translation grammars from aligned texts
Michael Carl
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

2000

pdf bib
A Model of Competence for Corpus-Based Machine Translation
Michael Carl
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf bib
Combining invertible example-based machine translation with translation memory technology
Michael Carl
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper presents an approach to extract invertible trans- lation examples from pre-aligned reference translations. The set of in- vertible translation examples is used in the Example-Based Machine Translation (EBMT) system EDGAR for translation. Invertible bilin- gual grammars eliminate translation ambiguities such that each source language parse tree maps into only one target language string. The trans- lation results of EDGAR are compared and combined with those of a translation memory (TM). It is shown that i) best translation results are achieved for the EBMT system when using a bilingual lexicon to sup- port the alignment process ii) TMs and EBMT-systems can be linked in a dynamical sequential manner and iii) the combined translation of TMs and EBMT is in any case better than each of the single system.

1999

pdf bib
Inducing translation templates for example-based machine translation
Michael Carl
Proceedings of Machine Translation Summit VII

This paper describes an example-based machine translation (EBMT) system which relays on various knowledge resources. Morphologic analyses abstract the surface forms of the languages to be translated. A shallow syntactic rule formalism is used to percolate features in derivation trees. Translation examples serve the decomposition of the text to be translated and determine the transfer of lexical values into the target language. Translation templates determine the word order of the target language and the type of phrases (e.g. noun phrase, prepositional phase, ...) to be generated in the target language. An induction mechanism generalizes translation templates from translation examples. The paper outlines the basic idea underlying the EBMT system and investigates the possibilities and limits of the translation template induction process.

pdf bib
Linking translation memories with example-based machine translation
Michael Carl | Silvia Hansen
Proceedings of Machine Translation Summit VII

The paper reports on experiments which compare the translation outcome of three corpus-based MT systems, a string-based translation memory (STM), a lexeme-based translation memory (LTM) and the example-based machine translation (EBMT) system EDGAR. We use a fully automatic evaluation method to compare the outcome of each MT system and discuss the results. We investigate the benefits for the linkage of different MT strategies such as TMsystems and EBMT systems.

1998

pdf bib
A Constructivist Approach to Machine Translation
Michael Carl
New Methods in Language Processing and Computational Natural Language Learning

pdf bib
Shallow Post Morphological Processing with KURD
Michael Carl | Antje Schmidt-Wigger
New Methods in Language Processing and Computational Natural Language Learning