Maoxi Li


2021

pdf bib
融合XLM词语表示的神经机器译文自动评价方法(Neural Automatic Evaluation of Machine Translation Method Combined with XLM Word Representation)
Wei Hu (胡纬) | Maoxi Li (李茂西) | Bailian Qiu (裘白莲) | Mingwen Wang (王明文)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

机器译文自动评价对机器翻译的发展和应用起着重要的促进作用,它一般通过计算机器译文和人工参考译文的相似度来度量机器译文的质量。该文通过跨语种预训练语言模型XLM将源语言句子、机器译文和人工参考译文映射到相同的语义空间,结合分层注意力和内部注意力提取源语言句子与机器译文、机器译文与人工参考译文以及源语言句子与人工参考译文之间差异特征,并将其融入到基于Bi-LSTM神经译文自动评价方法中。在WMT’19译文自动评价数据集上的实验结果表明,融合XLM词语表示的神经机器译文自动评价方法显著提高了其与人工评价的相关性。

2020

pdf bib
引入源端信息的机器译文自动评价方法研究(Research on Incorporating the Source Information to Automatic Evaluation of Machine Translation)
Qi Luo (罗琪) | Maoxi Li (李茂西)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

机器译文自动评价是机器翻译中的一个重要任务。针对目前译文自动评价中完全忽略源语言句子信息,仅利用人工参考译文度量翻译质量的不足,该文提出了引入源语言句子信息的机器译文自动评价方法:从机器译文与其源语言句子组成的二元组中提取描述翻译质量的质量向量,并将其与基于语境词向量的译文自动评价方法利用深度神经网络进行融合。在WMT’19译文自动评价任务数据集上的实验结果表明,所提出的方法能够有效增强机器译文自动评价与人工评价的相关性。深入的实验分析进一步揭示了源语言句子信息在译文自动评价中发挥着重要的作用。

pdf bib
“细粒度英汉机器翻译错误分析语料库”的构建与思考(Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation and Its Implications)
Bailian Qiu (裘白莲) | Mingwen Wang (王明文) | Maoxi Li (李茂西) | Cong Chen (陈聪) | Fan Xu (徐凡)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

机器翻译错误分析旨在找出机器译文中存在的错误,包括错误类型、错误分布等,它在机器翻译研究和应用中起着重要作用。该文将人工译后编辑与错误分析结合起来,对译后编辑操作进行错误标注,采用自动标注和人工标注相结合的方法,构建了一个细粒度英汉机器翻译错误分析语料库,其中每一个标注样本包括源语言句子、机器译文、人工参考译文、译后编辑译文、词错误率和错误类型标注;标注的错误类型包括增词、漏词、错词、词序错误、未译和命名实体翻译错误等。标注的一致性检验表明了标注的有效性;对标注语料的统计分析结果能有效地指导机器翻译系统的开发和人工译员的后编辑。

2018

pdf bib
Building Parallel Monolingual Gan Chinese Dialects Corpus
Fan Xu | Mingwen Wang | Maoxi Li
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Improving Machine Translation Quality Estimation with Neural Network Features
Zhiming Chen | Yiming Tan | Chenlin Zhang | Qingyu Xiang | Lilin Zhang | Maoxi Li | Mingwen Wang
Proceedings of the Second Conference on Machine Translation

pdf bib
Neural Post-Editing Based on Quality Estimation
Yiming Tan | Zhiming Chen | Liu Huang | Lilin Zhang | Maoxi Li | Mingwen Wang
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Extract Domain-specific Paraphrase from Monolingual Corpus for Automatic Evaluation of Machine Translation
Lilin Zhang | Zhen Weng | Wenyan Xiao | Jianyi Wan | Zhiming Chen | Yiming Tan | Maoxi Li | Mingwen Wang
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Building Monolingual Word Alignment Corpus for the Greater China Region
Fan Xu | Xiongfei Xu | Mingwen Wang | Maoxi Li
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

2013

pdf bib
Listwise Approach to Learning to Rank for Automatic Evaluation of Machine Translation
Maoxi Li | Aiwen Jiang | Mingwen Wang
Proceedings of Machine Translation Summit XIV: Papers

2012

pdf bib
Confusion Network Based System Combination for Chinese Translation Output: Word-Level or Character-Level?
Maoxi Li | MingWen Wang
Proceedings of the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT

2011

pdf bib
Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?
Maoxi Li | Chengqing Zong | Hwee Tou Ng
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf bib
The CASIA statistical machine translation system for IWSLT 2009
Maoxi Li | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper reports on the participation of CASIA (Institute of Automation Chinese Academy of Sciences) at the evaluation campaign of the International Workshop on Spoken Language Translation 2009. We participated in the challenge tasks for Chinese-to-English and English-to-Chinese translation respectively and the BTEC task for Chinese-to-English translation only. For all of the tasks, system performance is improved with some special methods as follows: 1) combining different results of Chinese word segmentation, 2) combining different results of word alignments, 3) adding reliable bilingual words with high probabilities to the training data, 4) handling named entities including person names, location names, organization names, temporal and numerical expressions additionally, 5) combining and selecting translations from the outputs of multiple translation engines, 6) replacing Chinese character with Chinese Pinyin to train the translation model for Chinese-to-English ASR challenge task. This is a new approach that has never been introduced before.

2008

pdf bib
The CASIA statistical machine translation system for IWSLT 2008
Yanqing He | Jiajun Zhang | Maoxi Li | Licheng Fang | Yufeng Chen | Yu Zhou | Chengqing Zong
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our statistical machine translation system (CASIA) used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2008. In this year's evaluation, we participated in challenge task for Chinese-English and English-Chinese, BTEC task for Chinese-English. Here, we mainly introduce the overview of our system, the primary modules, the key techniques, and the evaluation results.