Boxing Chen


2021

pdf bib
Context-Interactive Pre-Training for Document Machine Translation
Pengcheng Yang | Pei Zhang | Boxing Chen | Jun Xie | Weihua Luo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Document machine translation aims to translate the source sentence into the target language in the presence of additional contextual information. However, it typically suffers from a lack of doc-level bilingual data. To remedy this, here we propose a simple yet effective context-interactive pre-training approach, which targets benefiting from external large-scale corpora. The proposed model performs inter sentence generation to capture the cross-sentence dependency within the target document, and cross sentence translation to make better use of valuable contextual information. Comprehensive experiments illustrate that our approach can achieve state-of-the-art performance on three benchmark datasets, which significantly outperforms a variety of baselines.

pdf bib
Continual Learning for Neural Machine Translation
Yue Cao | Hao-Ran Wei | Boxing Chen | Xiaojun Wan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus. However, this bears the risk of catastrophic forgetting that the performance on the general domain is decreased drastically. In this work, we propose a new continual learning framework for NMT models. We consider a scenario where the training is comprised of multiple stages and propose a dynamic knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. We also find that the bias exists in the output linear projection when fine-tuning on the in-domain corpus, and propose a bias-correction module to eliminate the bias. We conduct experiments on three representative settings of NMT application. Experimental results show that the proposed method achieves superior performance compared to baseline models in all settings.

pdf bib
Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training
Linqing Chen | Junhui Li | Zhengxian Gong | Boxing Chen | Weihua Luo | Min Zhang | Guodong Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Context-aware neural machine translation (NMT) remains challenging due to the lack of large-scale document-level parallel corpora. To break the corpus bottleneck, in this paper we aim to improve context-aware NMT by taking the advantage of the availability of both large-scale sentence-level parallel dataset and source-side monolingual documents. To this end, we propose two pre-training tasks. One learns to translate a sentence from source language to target language on the sentence-level parallel dataset while the other learns to translate a document from deliberately noised to original on the monolingual documents. Importantly, the two pre-training tasks are jointly and simultaneously learned via the same model, thereafter fine-tuned on scale-limited parallel documents from both sentence-level and document-level perspectives. Experimental results on four translation tasks show that our approach significantly improves translation performance. One nice property of our approach is that the fine-tuned model can be used to translate both sentences and documents.

pdf bib
G-Transformer for Document-Level Machine Translation
Guangsheng Bao | Yue Zhang | Zhiyang Teng | Boxing Chen | Weihua Luo
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Document-level MT models are still far from satisfactory. Existing work extend translation unit from single sentence to multiple sentences. However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail. In this paper, we find such failure is not caused by overfitting, but by sticking around local minima during training. Our analysis shows that the increased complexity of target-to-source attention is a reason for the failure. As a solution, we propose G-Transformer, introducing locality assumption as an inductive bias into Transformer, reducing the hypothesis space of the attention from target to source. Experiments show that G-Transformer converges faster and more stably than Transformer, achieving new state-of-the-art BLEU scores for both nonpretraining and pre-training settings on three benchmark datasets.

pdf bib
Adaptive Nearest Neighbor Machine Translation
Xin Zheng | Zhirui Zhang | Junliang Guo | Shujian Huang | Boxing Chen | Weihua Luo | Jiajun Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

kNN-MT, recently proposed by Khandelwal et al. (2020a), successfully combines pre-trained neural machine translation (NMT) model with token-level k-nearest-neighbor (kNN) retrieval to improve the translation accuracy. However, the traditional kNN algorithm used in kNN-MT simply retrieves a same number of nearest neighbors for each target token, which may cause prediction errors when the retrieved neighbors include noises. In this paper, we propose Adaptive kNN-MT to dynamically determine the number of k for each target token. We achieve this by introducing a light-weight Meta-k Network, which can be efficiently trained with only a few training samples. On four benchmark machine translation datasets, we demonstrate that the proposed method is able to effectively filter out the noises in retrieval results and significantly outperforms the vanilla kNN-MT model. Even more noteworthy is that the Meta-k Network learned on one domain could be directly applied to other domains and obtain consistent improvements, illustrating the generality of our method. Our implementation is open-sourced at https://github.com/zhengxxn/adaptive-knn-mt.

pdf bib
Manifold Adversarial Augmentation for Neural Machine Translation
Guandan Chen | Kai Fan | Kaibo Zhang | Boxing Chen | Zhongqiang Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Domain Transfer based Data Augmentation for Neural Query Translation
Liang Yao | Baosong Yang | Haibo Zhang | Boxing Chen | Weihua Luo
Proceedings of the 28th International Conference on Computational Linguistics

Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR). Due to the lack of parallel query samples, neural-based QT models are usually optimized with synthetic data which are derived from large-scale monolingual queries. Nevertheless, such kind of pseudo corpus is mostly produced by a general-domain translation model, making it be insufficient to guide the learning of QT model. In this paper, we extend the data augmentation with a domain transfer procedure, thus to revise synthetic candidates to search-aware examples. Specifically, the domain transfer model is built upon advanced Transformer, in which layer coordination and mixed attention are exploited to speed up the refining process and leverage parameters from a pre-trained cross-lingual language model. In order to examine the effectiveness of the proposed method, we collected French-to-English and Spanish-to-English QT test sets, each of which consists of 10,000 parallel query pairs with careful manual-checking. Qualitative and quantitative analyses reveal that our model significantly outperforms strong baselines and the related domain transfer methods on both translation quality and retrieval accuracy.

pdf bib
Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences
Xiangyu Duan | Baijun Ji | Hao Jia | Min Tan | Min Zhang | Boxing Chen | Weihua Luo | Yue Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a new task of machine translation (MT), which is based on no parallel sentences but can refer to a ground-truth bilingual dictionary. Motivated by the ability of a monolingual speaker learning to translate via looking up the bilingual dictionary, we propose the task to see how much potential an MT system can attain using the bilingual dictionary and large scale monolingual corpora, while is independent on parallel sentences. We propose anchored training (AT) to tackle the task. AT uses the bilingual dictionary to establish anchoring points for closing the gap between source language and target language. Experiments on various language pairs show that our approaches are significantly better than various baselines, including dictionary-based word-by-word translation, dictionary-supervised cross-lingual word embedding transformation, and unsupervised MT. On distant language pairs that are hard for unsupervised MT to perform well, AT performs remarkably better, achieving performances comparable to supervised SMT trained on more than 4M parallel sentences.

pdf bib
Self-Paced Learning for Neural Machine Translation
Yu Wan | Baosong Yang | Derek F. Wong | Yikai Zhou | Lidia S. Chao | Haibo Zhang | Boxing Chen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training examples; and 2) flexibly govern its learning via regulating the loss in each iteration step. Experimental results over multiple translation tasks demonstrate that the proposed model yields better performance than strong baselines and those models trained with human-designed curricula on both translation quality and convergence speed.

pdf bib
Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation
Pei Zhang | Boxing Chen | Niyu Ge | Kai Fan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.

pdf bib
Iterative Domain-Repaired Back-Translation
Hao-Ran Wei | Zhirui Zhang | Boxing Chen | Weihua Luo
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. One common and effective strategy for this case is exploiting in-domain monolingual data with the back-translation method. However, the synthetic parallel data is very noisy because they are generated by imperfect out-of-domain systems, resulting in the poor performance of domain adaptation. To address this issue, we propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair (DR) model to refine translations in synthetic bilingual data. To this end, we construct corresponding data for the DR model training by round-trip translating the monolingual sentences, and then design the unified training framework to optimize paired DR and NMT models jointly. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach, achieving 15.79 and 4.47 BLEU improvements on average over unadapted models and back-translation.

2019

pdf bib
Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention
Xiangyu Duan | Mingming Yin | Min Zhang | Boxing Chen | Weihua Luo
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Abstractive Sentence Summarization (ASSUM) targets at grasping the core idea of the source sentence and presenting it as the summary. It is extensively studied using statistical models or neural models based on the large-scale monolingual source-summary parallel corpus. But there is no cross-lingual parallel corpus, whose source sentence language is different to the summary language, to directly train a cross-lingual ASSUM system. We propose to solve this zero-shot problem by using resource-rich monolingual ASSUM system to teach zero-shot cross-lingual ASSUM system on both summary word generation and attention. This teaching process is along with a back-translation process which simulates source-summary pairs. Experiments on cross-lingual ASSUM task show that our proposed method is significantly better than pipeline baselines and previous works, and greatly enhances the cross-lingual performances closer to the monolingual performances.

pdf bib
Lattice Transformer for Speech Translation
Pei Zhang | Niyu Ge | Boxing Chen | Kai Fan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent advances in sequence modeling have highlighted the strengths of the transformer architecture, especially in achieving state-of-the-art machine translation results. However, depending on the up-stream systems, e.g., speech recognition, or word segmentation, the input to translation system can vary greatly. The goal of this work is to extend the attention mechanism of the transformer to naturally consume the lattice in addition to the traditional sequential input. We first propose a general lattice transformer for speech translation where the input is the output of the automatic speech recognition (ASR) which contains multiple paths and posterior scores. To leverage the extra information from the lattice structure, we develop a novel controllable lattice attention mechanism to obtain latent representations. On the LDC Spanish-English speech translation corpus, our experiments show that lattice transformer generalizes significantly better and outperforms both a transformer baseline and a lattice LSTM. Additionally, we validate our approach on the WMT 2017 Chinese-English translation task with lattice inputs from different BPE segmentations. In this task, we also observe the improvements over strong baselines.

2018

pdf bib
Alibaba’s Neural Machine Translation Systems for WMT18
Yongchao Deng | Shanbo Cheng | Jun Lu | Kai Song | Jingang Wang | Shenglan Wu | Liang Yao | Guchun Zhang | Haibo Zhang | Pei Zhang | Changfeng Zhu | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission systems of Alibaba for WMT18 shared news translation task. We participated in 5 translation directions including English ↔ Russian, English ↔ Turkish in both directions and English → Chinese. Our systems are based on Google’s Transformer model architecture, into which we integrated the most recent features from the academic research. We also employed most techniques that have been proven effective during the past WMT years, such as BPE, back translation, data selection, model ensembling and reranking, at industrial scale. For some morphologically-rich languages, we also incorporated linguistic knowledge into our neural network. For the translation tasks in which we have participated, our resulting systems achieved the best case sensitive BLEU score in all 5 directions. Notably, our English → Russian system outperformed the second reranked system by 5 BLEU score.

pdf bib
Alibaba Submission for WMT18 Quality Estimation Task
Jiayi Wang | Kai Fan | Bo Li | Fengming Zhou | Boxing Chen | Yangbin Shi | Luo Si
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The goal of WMT 2018 Shared Task on Translation Quality Estimation is to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents the QE Brain system, which proposes the neural Bilingual Expert model as a feature extractor based on conditional target language model with a bidirectional transformer and then processes the semantic representations of source and the translation output with a Bi-LSTM predictive model for automatic quality estimation. The system has been applied to the sentence-level scoring and ranking tasks as well as the word-level tasks for finding errors for each word in translations. An extensive set of experimental results have shown that our system outperformed the best results in WMT 2017 Quality Estimation tasks and obtained top results in WMT 2018.

pdf bib
Alibaba Submission to the WMT18 Parallel Corpus Filtering Task
Jun Lu | Xiaoyu Lv | Yangbin Shi | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Alibaba Machine Translation Group submissions to the WMT 2018 Shared Task on Parallel Corpus Filtering. While evaluating the quality of the parallel corpus, the three characteristics of the corpus are investigated, i.e. 1) the bilingual/translation quality, 2) the monolingual quality and 3) the corpus diversity. Both rule-based and model-based methods are adapted to score the parallel sentence pairs. The final parallel corpus filtering system is reliable, easy to build and adapt to other language pairs.

2017

pdf bib
Cost Weighting for Neural Machine Translation Domain Adaptation
Boxing Chen | Colin Cherry | George Foster | Samuel Larkin
Proceedings of the First Workshop on Neural Machine Translation

In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.

pdf bib
NRC Machine Translation System for WMT 2017
Chi-kiu Lo | Boxing Chen | Colin Cherry | George Foster | Samuel Larkin | Darlene Stewart | Roland Kuhn
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Semi-supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data
Boxing Chen | Fei Huang
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

2015

pdf bib
Multi-level Evaluation for Machine Translation
Boxing Chen | Hongyu Guo | Roland Kuhn
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Representation Based Translation Evaluation Metrics
Boxing Chen | Hongyu Guo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Bilingual Sentiment Consistency for Statistical Machine Translation
Boxing Chen | Xiaodan Zhu
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A comparison of mixture and vector space techniques for translation model adaptation
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

In this paper, we propose two extensions to the vector space model (VSM) adaptation technique (Chen et al., 2013b) for statistical machine translation (SMT), both of which result in significant improvements. We also systematically compare the VSM techniques to three mixture model adaptation techniques: linear mixture, log-linear mixture (Foster and Kuhn, 2007), and provenance features (Chiang et al., 2011). Experiments on NIST Chinese-to-English and Arabic-to-English tasks show that all methods achieve significant improvement over a competitive non-adaptive baseline. Except for the original VSM adaptation method, all methods yield improvements in the +1.7-2.0 BLEU range. Combining them gives further significant improvements of up to +2.6-3.3 BLEU over the baseline.

pdf bib
A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU
Boxing Chen | Colin Cherry
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
Adaptation of Reordering Models for Statistical Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Vector Space Model for Adaptation in Statistical Machine Translation
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Simulating Discriminative Training for Linear Mixture Adaptation in Statistical Machine Translation
George Foster | Boxing Chen | Roland Kuhn
Proceedings of Machine Translation Summit XIV: Papers

2012

pdf bib
Improving AMBER, an MT Evaluation Metric
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
Boxing Chen | Roland Kuhn | Samuel Larkin
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
AMBER: A Modified BLEU, Enhanced Ranking Metric
Boxing Chen | Roland Kuhn
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables
Boxing Chen | Roland Kuhn | George Foster | Howard Johnson
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Semantic smoothing and fabrication of phrase pairs for SMT
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

In statistical machine translation systems, phrases with similar meanings often have similar but not identical distributions of translations. This paper proposes a new soft clustering method to smooth the conditional translation probabilities for a given phrase with those of semantically similar phrases. We call this semantic smoothing (SS). Moreover, we fabricate new phrase pairs that were not observed in training data, but which may be used for decoding. In learning curve experiments against a strong baseline, we obtain a consistent pattern of modest improvement from semantic smoothing, and further modest improvement from phrase pair fabrication.

2010

pdf bib
Fast Consensus Hypothesis Regeneration for Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Lessons from NRC’s Portage System at WMT 2010
Samuel Larkin | Boxing Chen | George Foster | Ulrich Germann | Eric Joanis | Howard Johnson | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Bilingual Sense Similarity for Statistical Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables
Roland Kuhn | Boxing Chen | George Foster | Evan Stratford
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Phrase Translation Model Enhanced with Association based Features
Boxing Chen | George Foster | Roland Kuhn
Proceedings of Machine Translation Summit XII: Papers

pdf bib
A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination
Boxing Chen | Min Zhang | Haizhou Li | Aiti Aw
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Regenerating Hypotheses for Statistical Machine Translation
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
I2R multi-pass machine translation system for IWSLT 2008.
Boxing Chen | Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe the system and approach used by the Institute for Infocomm Research (I2R) for the IWSLT 2008 spoken language translation evaluation campaign. In the system, we integrate various decoding algorithms into a multi-pass translation framework. The multi-pass approach enables us to utilize various decoding algorithm and to explore much more hypotheses. This paper reports our design philosophy, overall architecture, each individual system and various system combination methods that we have explored. The performance on development and test sets are reported in detail in the paper. The system has shown competitive performance with respect to the BLEU and METEOR measures in Chinese-English Challenge and BTEC tasks.

pdf bib
Exploiting N-best Hypotheses for SMT Self-Enhancement
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Better n-best translations through generative n-gram language models
Boxing Chen | Marcello Federico | Mauro Cettolo
Proceedings of Machine Translation Summit XI: Papers

pdf bib
I2R Chinese-English translation system for IWSLT 2007
Boxing Chen | Jun Sun | Hongfei Jiang | Min Zhang | Ai Ti Aw
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper, we describe the system and approach used by Institute for Infocomm Research (I2R) for the IWSLT 2007 spoken language evaluation campaign. A multi-pass approach is exploited to generate and select best translation. First, we use two decoders namely the open source Moses and an in-home syntax-based decoder to generate N-best lists. Next we spawn new translation entries through a word-based n-gram language model estimated on the former N-best entries. Finally, we join the N-best lists from the previous two passes, and select the best translation by rescoring them with additional feature functions. In particular, this paper reports our effort on new translation entry generation and system combination. The performance on development and test sets are reported. The system was ranked first with respect to the BLEU measure in Chinese-to-English open data track.

2006

pdf bib
The ITC-irst SMT system for IWSLT 2006
Boxing Chen | Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Reordering rules for phrase-based statistical machine translation
Boxing Chen | Mauro Cettolo | Marcello Federico
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

pdf bib
A Web-based Demonstrator of a Multi-lingual Phrase-based Translation System
Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Boxing Chen | Marcello Federico
Demonstrations

2005

pdf bib
The ITC-irst SMT System for IWSLT-2005
Boxing Chen | Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib
Contextes multilingues alignés pour la désambiguïsation sémantique : une étude expérimentale
Boxing Chen | Meriam Haddara | Olivier Kraif | Grégoire Moreau de Montcheuil | Marc El-Bèze
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article s’intéresse a la désambiguïsation sémantique d’unités lexicales alignées a travers un corpus multilingue. Nous appliquons une méthode automatique non supervisée basée sur la comparaison de réseaux sémantiques, et nous dégageons un critère permettant de déterminer a priori si 2 unités alignées ont une chance de se désambiguïser mutuellement. Enfin, nous développons une méthode fondée sur un apprentissage a partir de contextes bilingues. En appliquant ce critère afin de déterminer pour quelles unités l’information traductionnelle doit être prise en compte, nous obtenons une amélioration des résultats.

2004

pdf bib
Using a Word Sense Disambiguation system for translation disambiguation: the LIA-LIDILEM team experiment
Grégoire Moreau de Montcheuil | Marc El-Bèze | Boxing Chen | Olivier Kraif
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Combining clues for lexical level aligning using the Null hypothesis approach
Olivier Kraif | Boxing Chen
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Preparatory Work on Automatic Extraction of Bilingual Multi-Word Units from Parallel Corpora
Boxing Chen | Limin Du
International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 2, August 2003