Yachao Li
2025
藏汉篇章机器翻译研究及语料库构建
Jiale Tian | Jing Jiang | Yachao Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Jiale Tian | Jing Jiang | Yachao Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"篇章机器翻译旨在使用计算机将源语言篇章自动翻译为具有相同语义的目标语言篇章,是机器翻译的前沿研究热点。相对于传统的句子级翻译,以篇章作为翻译单位,模型能够更有效地利用上下文信息,提升翻译的一致性与连贯性,具有广阔的应用前景和研究价值。与资源丰富语言(如汉语、英语、法语等)机器翻译研究相比,藏语机器翻译资源稀缺,公开可用的数据集数量有限,在篇章级机器翻译方面的探索尚无公开论文发表。鉴于此,本文首先构建一个藏汉翻译数据集,标注了句子级、段落级和篇章级的边界,为藏汉篇章翻译任务提供高质量的多粒度标注数据集。然后,本文基于该数据集研究了藏汉篇章机器翻译,并对比机器翻译在句子层面、段落层面和篇章层面翻译效果的差异。本文对所构建的藏汉篇章翻译语料库予以开源,希望能推动相关研究的发展。链接:https://github.com/liyc7711/tb-zh-mt。"
2018
Adaptive Weighting for Neural Machine Translation
Yachao Li | Junhui Li | Min Zhang
Proceedings of the 27th International Conference on Computational Linguistics
Yachao Li | Junhui Li | Min Zhang
Proceedings of the 27th International Conference on Computational Linguistics
In the popular sequence to sequence (seq2seq) neural machine translation (NMT), there exist many weighted sum models (WSMs), each of which takes a set of input and generates one output. However, the weights in a WSM are independent of each other and fixed for all inputs, suggesting that by ignoring different needs of inputs, the WSM lacks effective control on the influence of each input. In this paper, we propose adaptive weighting for WSMs to control the contribution of each input. Specifically, we apply adaptive weighting for both GRU and the output state in NMT. Experimentation on Chinese-to-English translation and English-to-German translation demonstrates that the proposed adaptive weighting is able to much improve translation accuracy by achieving significant improvement of 1.49 and 0.92 BLEU points for the two translation tasks. Moreover, we discuss in-depth on what type of information is encoded in the encoder and how information influences the generation of target words in the decoder.