Feng Wenhe

Also published as: 文贺


2024

pdf bib
基于隐性句逗号识别的汉语长句机器翻译(Machine translation of Chinese long sentences based on recognition of implicit period and comma)
Zhang Wenjuan (张文娟) | Li Manjia (李熳佳) | Feng Wenhe (冯文贺)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“长句翻译一直是机器翻译的难题。本文根据汉语中相当数量的逗号(句内标点)和句号(句间标点)可相互转化的特点,提出”隐性句号”(可转化为句号的逗号)和”隐性逗号”(可转化为逗号的句号)概念,并实现其自动识别,以将汉语长句变为短句用于汉英机器翻译。为此,首先通过人工与半监督学习结合方法构建了一个隐性句逗数据集,实现了基于预训练模型的隐性句逗识别方法,其中性能最好的HierarchicalBERT作为后续应用模型。进而,实现了基于隐性句逗识别的汉英机器翻译方法。在WMT2018(新闻)和WMT2023(文学)测试语料上基于预训练机器翻译模型的实验表明,对于汉语长句的英译,本文方法相比基准翻译的BLEU值整体有所提高,而且在相对稳健机器翻译模型上,呈现为句子越长本文方法效果越明显。”

pdf bib
Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
Zhang Jingshen | Chen Xinglu | Qiu Xinying | Wang Zhimin | Feng Wenhe
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Chinese sentence simplification faces challenges due to the lack of large-scale labeledparallel corpora and the prevalence of idioms. To address these challenges, we pro-pose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel frameworkthat combines data augmentation techniques. RISS introduces two key components: (1)Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sen-tence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the compre-hension and simplification of idiomatic expressions. By integrating RPS and IAS usingmulti-stage and multi-task learning strategies, RISS outperforms previous state-of-the-artmethods on two Chinese sentence simplification datasets. Furthermore, RISS achievesadditional improvements when fine-tuned on a small labeled dataset. Our approachdemonstrates the potential for more effective and accessible Chinese text simplification.”