2014
pdf
bib
abs
English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
Sharid Loáiciga
|
Thomas Meyer
|
Andrei Popescu-Belis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents a method for verb phrase (VP) alignment in an English-French parallel corpus and its use for improving statistical machine translation (SMT) of verb tenses. The method starts from automatic word alignment performed with GIZA++, and relies on a POS tagger and a parser, in combination with several heuristics, in order to identify non-contiguous components of VPs, and to label the aligned VPs with their tense and voice on each side. This procedure is applied to the Europarl corpus, leading to the creation of a smaller, high-precision parallel corpus with about 320,000 pairs of finite VPs, which is made publicly available. This resource is used to train a tense predictor for translation from English into French, based on a large number of surface features. Three MT systems are compared: (1) a baseline phrase-based SMT; (2) a tense-aware SMT system using the above predictions within a factored translation model; and (3) a system using oracle predictions from the aligned VPs. For several tenses, such as the French “imparfait”, the tense-aware SMT system improves significantly over the baseline and is closer to the oracle system.
pdf
bib
abs
Cross-linguistic annotation of narrativity for English/French verb tense disambiguation
Cristina Grisot
|
Thomas Meyer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents manual and automatic annotation experiments for a pragmatic verb tense feature (narrativity) in English/French parallel corpora. The feature is considered to play an important role for translating English Simple Past tense into French, where three different tenses are available. Whether the French Passe Ì Compose Ì, Passe Ì Simple or Imparfait should be used is highly dependent on a longer-range context, in which either narrative events ordered in time or mere non-narrative state of affairs in the past are described. This longer-range context is usually not available to current machine translation (MT) systems, that are trained on parallel corpora. Annotating narrativity prior to translation is therefore likely to help current MT systems. Our experiments show that narrativity can be reliably identified with kappa-values of up to 0.91 in manual annotation and with F1 scores of up to 0.72 in automatic annotation.
2013
pdf
bib
Implicitation of Discourse Connectives in (Machine) Translation
Thomas Meyer
|
Bonnie Webber
Proceedings of the Workshop on Discourse in Machine Translation
pdf
bib
Detecting Narrativity to Improve English to French Translation of Simple Past Verbs
Thomas Meyer
|
Cristina Grisot
|
Andrei Popescu-Belis
Proceedings of the Workshop on Discourse in Machine Translation
pdf
bib
Machine Translation with Many Manually Labeled Discourse Connectives
Thomas Meyer
|
Lucie Poláková
Proceedings of the Workshop on Discourse in Machine Translation
2012
pdf
bib
Using Sense-labeled Discourse Connectives for Statistical Machine Translation
Thomas Meyer
|
Andrei Popescu-Belis
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
pdf
bib
abs
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
Bruno Cartoni
|
Thomas Meyer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Translation studies rely more and more on corpus data to examine specificities of translated texts, that can be translated from different original languages and compared to original texts. In parallel, more and more multilingual corpora are becoming available for various natural language processing tasks. This paper questions the use of these multilingual corpora in translation studies and shows the methodological steps needed in order to obtain more reliably comparable sub-corpora that consist of original and directly translated text only. Various experiments are presented that show the advantage of directional sub-corpora.
pdf
bib
abs
Discourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns
Andrei Popescu-Belis
|
Thomas Meyer
|
Jeevanthi Liyanapathirana
|
Bruno Cartoni
|
Sandrine Zufferey
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.
pdf
bib
abs
Machine Translation of Labeled Discourse Connectives
Thomas Meyer
|
Andrei Popescu-Belis
|
Najeh Hajlaoui
|
Andrea Gesmundo
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers
This paper shows how the disambiguation of discourse connectives can improve their automatic translation, while preserving the overall performance of statistical MT as measured by BLEU. State-of-the-art automatic classifiers for rhetorical relations are used prior to MT to label discourse connectives that signal those relations. These labels are used for MT in two ways: (1) by augmenting factored translation models; and (2) by using the probability distributions of labels in order to train and tune SMT. The improvement of translation quality is demonstrated using a new semi-automated metric for discourse connectives, on the English/French WMT10 data, while BLEU scores remain comparable to non-discourse-aware systems, due to the low frequency of discourse connectives.
2011
pdf
bib
Disambiguating temporal-contrastive connectives for machine translation
Thomas Meyer
Proceedings of the ACL 2011 Student Session
pdf
bib
How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives
Bruno Cartoni
|
Sandrine Zufferey
|
Thomas Meyer
|
Andrei Popescu-Belis
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
pdf
bib
Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation
Thomas Meyer
|
Andrei Popescu-Belis
|
Sandrine Zufferey
|
Bruno Cartoni
Proceedings of the SIGDIAL 2011 Conference