Na Ye

2025

Logical table-to-text generation (LT2T) seeks to produce logically faithful textual descriptions base on tables. Current end-to-end LT2T models, which use descriptions directly as learning objectives, frequently face challenges in maintaining logical faithfulness due to the lack of a reasoning knowledge. Recent research have introduced reasoning knowledge generated by models for LT2T task, but the noise along with it limited its performance. We therefore propose a framework reasoning knowledge filter that leverages the collaboration between large language models and smaller models to filter data points with high-quality reasoning knowledge. This framework aims to provide highly matched table, description and reasoning knowledge triplets for LT2T. The results obtained on LogicNLG database demonstrate that the efficiencies of the method in this paper has achieved optimal performance with a reduced amount of data. Specifically, it enhances SP-Acc by 1.4 points and NLI-Acc by 0.7 points compared to the current state-of-the-art model.

pdf bib abs

基于证据理论和局部语义区分的嵌套命名实体识别
Bobo Xu | Na Ye | Mingchong Jiang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"嵌套命名实体识别(NER)是自然语言处理中一个基本任务,其目的是通过计算机辅助技术识别并提取嵌套实体及其对应语义类型。目前嵌套命名实体识别的主流研究方法是基于跨度的方法,该方法将实体识别视为一个跨度分类任务,可以有效地处理嵌套实体。然而,基于跨度的嵌套命名实体识别方法无法准确区分相似实体之间的细微语义区别。并且通过枚举的方式会产生大量噪声跨度,影响模型性能。针对上述问题,本文提出一种方法,既能够量化模型预测的不确定性,通过不确定性辅助模型的推理,降低噪声跨度对模型性能的影响,还能通过局部语义区分模块区分出实体间的语义区别。具体来说,针对噪声跨度对模型性能产生影响的问题,本文设计了一种不确定度引导的KNN辅助决策机制,用于在不确定性较高时对预测结果进行校正。此外,针对嵌套命名实体识别模型对实体边界模糊与语义重叠问题的识别能力不足,利用局部语义区分模块,通过建模当前跨度与邻域跨度的表示差异,引导模型关注细粒度语义差异,从而提升嵌套实体的识别准确性。该方法在GENIA 英文数据集和自建中文嵌套数据集上分别取得了81.27%和82.26%的F1 值,对比基线模型分别提升了0.52%和1.48%的F1值,验证了它对嵌套命名实体识别任务的有效性。"

pdf bib abs

基于大语言模型多维度特征增强的医学命名实体识别方法
Mingchong Jiang | Na Ye | Bobo Xu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"医学命名实体识别在医疗信息提取和知识图谱构建中至关重要,但因医学领域的专业性和复杂性,面临数据稀缺、特征不显著及上下文利用不足的挑战。本文提出LLM-MedNER方法,充分利用大语言模型(LLM)的预训练知识,通过提示工程生成语义等价但表达多样的增强文本,并提取多维度特征,包括关键字集合、语义描述、词性信息及医学实体关联特征,从而显著提升模型的特征表达能力。方法采用双通道MacBERT-BiGRU编码模块并行学习原始文本特征与大语言模型增强特征,通过交叉注意力机制融合不同语义特征。随后,引入自适应多粒度扩张卷积层,通过不同膨胀率的一维卷积捕获多尺度的局部上下文信息,进一步丰富词表示。并在输出层引入Biaffine模块实现实体边界及类型的精准识别。对比实验表明,LLM-MedNER在多个医学命名实体识别数据集上的表现优于现有基线方法;消融实验进一步证实各模块的有效性。"

2016

pdf bib abs

Interactive-Predictive Machine Translation based on Syntactic Constraints of Prefix
Na Ye | Guiping Zhang | Dongfeng Cai
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Interactive-predictive machine translation (IPMT) is a translation mode which combines machine translation technology and human behaviours. In the IPMT system, the utilization of the prefix greatly affects the interaction efficiency. However, state-of-the-art methods filter translation hypotheses mainly according to their matching results with the prefix on character level, and the advantage of the prefix is not fully developed. Focusing on this problem, this paper mines the deep constraints of prefix on syntactic level to improve the performance of IPMT systems. Two syntactic subtree matching rules based on phrase structure grammar are proposed to filter the translation hypotheses more strictly. Experimental results on LDC Chinese-English corpora show that the proposed method outperforms state-of-the-art phrase-based IPMT system while keeping comparable decoding speed.

2015

pdf bib

2011

pdf bib

Study on the Impact Factors of the Translators’ Post-editing Efficiency in a Collaborative Translation Environment
Na Ye | Guiping Zhang
Proceedings of Machine Translation Summit XIII: Papers

Na Ye

2025

2016

2015

2011

2005

Co-authors

Venues