Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He (Editors)


Anthology ID:
2024.ccl-1
Month:
July
Year:
2024
Address:
Taiyuan, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
URL:
https://aclanthology.org/2024.ccl-1/
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/2024.ccl-1.pdf

pdf bib
MITF:基于图像映射文本特征的跨模态图文检索方法(MITF:Cross-modal Image-text Retrieval Method with Mapping Images to Text Features)
Lou Xinyue (娄馨月) | Li You (李铀) | Qi Rui (齐睿) | Chen Yufeng (陈钰枫) | Xu Jinan (徐金安)

“减小图文信息间的语义鸿沟,促进跨模态信息的对齐与融合一直是解决跨模态图文检索问题的关键。但现有的双流模型因为训练时图像编码器与文本编码器是分开的,导致图文特征的对齐与融合较难。因此,本文提出图像映射文本特征(MITF)网络将不同模态(图像和文本)的信息映射到单一模态(文本),进一步增强跨模态语义的融合和对齐,提高图文检索的性能。具体地,在冻结预训练的中文视觉语言模型Chinese-CLIP参数的情况下,训练一个MITF网络将图像映射为伪语言标记,在此基础上引入提示词自动学习机制提升模型对于伪语言标记的理解能力。同时,在检索时构建Faiss索引提高检索速度。在三个开源数据集的实验结果表明所提方法相比原始Chinese-CLIP模型检索时的Mean Recall指标平均提高了3.7%,检索速度提高了约4倍。同时,图文特征可视化结果进一步表明所提方法提高了图像特征与文本特征的对齐程度。”

pdf bib
基于ChatGPT查询改写的文档检索方法(Document Retrieval Method Based on ChatGPT Query Rewriting)
Li Ao (李澳) | Tu Xinhui (涂新辉) | Xiong Yinghao (熊英豪)

“查询改写是一种通过优化查询从而提高检索结果质量的技术。传统的基于伪相关反馈的方法受限于伪相关文档的质量。本文提出了一种基于ChatGPT查询改写的文档检索方法。这种方法不依赖伪相关文档,可以避免伪相关文档质量不高的问题。首先,利用BM25模型进行检索,获得初次检索结果集;同时借助ChatGPT生成新查询;然后分别将原始查询和新查询作为输入,利用重排模型对初次检索结果集进行重排,得到各自的文档相关性得分;最后,将两个查询的文档相关性得分进行融合,得到最终的文档得分。在多个检索测试集上的实验结果表明,相比于基准模型,基于ChatGPT查询改写的文档检索方法在nDCG@10指标上平均提升了约4.5个百分点。”

pdf bib
基于汉语字词资源的检索增强生成与应用评估(Chinese Character- and Word-Based Retrieval Augmented Generation and Application)
Yin Yaqi (殷雅琦) | Liu Yang (刘扬) | Wang Yue (王悦) | Liang Qiliang (梁启亮)

“汉语遵循“由字组词,由词造句”的原则,字词相关信息是一类基础且关键的计算资源。在大语言模型时代,挖掘并评价该类资源的效用是增强模型语言能力的一个重要研究方面。作为有效促进资源与模型结合的一种方式,检索增强生成目前在该类资源上的应用大都关注模型未学习过的濒危语言,其在模型已学习过语言上的潜在价值有待挖掘。本文基于语言学的视角,构建具有良好例句覆盖率与丰富度的字词资源,并借助检索增强生成技术路线,探索这类资源与不同任务、模型的结合方法。评估实验表明,该方法在所有实验模型与任务中均带来了显著的准确率提升,平均达4.78%,其中,在语素义消歧、词义消歧与隐喻识别任务中分别提升了6.91%、4.24%和3.19%,这展示出字词资源对模型的语言准确理解能力的潜在价值。这些资源构造、方法探索和应用评估,为语言学资源与大语言模型的结合提供了新的思路与方法。”

pdf bib
面向CQL的语料库检索引擎的高效实现(Efficient Implementation of a CQL-oriented Corpus Retrieval Engine)
Liu Tingchao (刘廷超) | Lu Luming (鲁鹿鸣) | Yang Liner (麟儿 杨) | Wang Yu (王雨)

“语料库检索工具在语言学研究领域具有举足轻重的地位,对于高效获取信息至关重要。然而,当前国内语料库检索工具在语料库检索语言上缺乏统一标准,尤其支持语料库查询语言(CQL)的中文语料库检索工具相对稀缺。在使用不同分词粒度的语料库工具进行中文语料库检索时,会遇到噪声或数据召回难问题。为应对这些挑战,我们研发了支持多粒度分词的CQL 解析器系统CAMELS:一款支持CQL 语句检索,且兼容多粒度分词,支持非词典词检索的语料库检索引擎。经过多种分词器的测试,该引擎展现出了优异的召回率,并在性能上超越了BlackLab的检索速度,为语言学工作者提供了更加易用、精准的检索工具。”

pdf bib
NNP-TDGM: 基于最近邻提示表征的术语DEF生成模型(NNP-TDGM: Nearest Neighbor Prompt Term DEF Generation Model)
Shen Sijia (沈思嘉) | Wang Peiyan (王裴岩) | Wang Shengren (王胜任) | Wang Libang (王立帮)

“该文研究基于HowNet的知识库描述语言语法体系的术语DEF自动生成问题,提出基于最近邻提示表征的术语DEF生成模型(NNP-TDGM),将训练集中的术语DEF构造为外显记忆集,在解码器生成(首)义原或关系时,检索与待预测术语概念结构相同或相近的术语所蕴含的核心概念,重要属性和关系类型,辅助模型完成DEF的生成,解决解码器在低频样本上训练不充分的问题。另外,通过提示预训练语言模型获得术语及术语定义内蕴涵概念信息的语义表征向量,改善编码器表征能力不足的问题。经实验验证NNP-TDGM模型生成术语DEF的义原-关系-义原三元组F1值达到31.84%、关系F1值达到53.12%、义原F1值达到51.55%、首义原F1值达到68.53%,相对于基线方法分别提升了3.38%,1.45%,1.08%,0.48%。”

pdf bib
SpanCS:面向跨语言代码生成的片段级语码转换(SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation)
Zhu Qingfu (朱庆福) | Zhou Shiqi (周士祺) | Wang Shuo (王硕) | Zhang Zhiming (张致铭) | Wang Haoyu (王昊钰) | Chen Qiguang (陈麒光) | Che Wanxiang (车万翔)

“跨语言代码生成旨在将英语到代码的生成能力迁移至其他自然语言。翻译-训 练(Translate-Train)和语码转换(Code-Switching)是实现跨语言迁移的两类经典数据增广方法,两者优势互补但尚未有效结合。为此,本文提出了一种面向跨语言代码生成的片段级语码转换(SpanCS)方法。首先,该方法利用语码转换框架关联源语言上下文与目标语言片段,以促进多种语言的交互和对齐。其次,该方法利用翻译-训练方法从完整的源语言翻译中提取目标语言片段,以保证增广数据与原始数据间的语义一致性。为了公平地评价多种自然语言之间代码生成的性能差异,本文通过人工翻译与校验,基于HumanEval构建了包含10种自然语言的多语言代码生成评测基MHumanEval。该基准上的三个主干模型的实验结果表明,SpanCS在跨语言代码生成任务上一致优于前人的数据增广方法。”

pdf bib
场景图增强的视觉语言常识推理生成(Scene Graph Enhanced Visual Language Commonsense Reasoning Generation)
Yuan Fan (袁凡) | Li Piji (李丕绩)

“视觉语言常识推理是一类旨在理解视觉场景的任务,常用于评估人工智能系统的多模态常识推理能力。然而,可靠的常识推理需要细致的场景理解,而现有的基于预训练模型微调的方法却无法有效地利用具体场景中存在的物体关系信息,因此其推理的合理性存在较大的局限性。为解决上述问题,本研究提出了一种场景图增强的视觉语言常识推理生成框架SGEVL。该框架首先使用图像补丁序列提供视觉信息,并通过一种包含注意力模块的门控机制,赋予大型语言模型理解视觉信息的能力。基于该框架的视觉语言能力,进一步提出了一种无位置信息的场景图生成方法。生成的场景图能够显著提升模型对场景信息的理解,从而引导生成高质量的回答和推理。通过在VCR,VQA-X和e-SNLI-VE数据集上分别实验,实验结果表明本文提出的视觉语言常识推理框架性能优于基线模型。此外,通过消融实验和结果可视化,进一步证明了该框架中每个模块的有效性。”

pdf bib
基于逻辑推理和多任务融合的认知刺激对话生成方法(Cognitive stimulation dialogue generation method based on logical reasoning and multi-task integration)
Jiang Yuru (蒋玉茹) | Li Mengyuan (李梦媛) | Tao Yuyang (陶宇阳) | Qu Keming (区可明) | She Zepeng (佘泽鹏) | Shi Shuicai (施水才)

“在全球老龄化背景下,带有认知刺激的对话系统是保持老年人认知健康的重要手段。中文认知刺激对话数据集(Chinese Cognitive Stimulation Conversation Dataset,CSConv)和模型构建的研究工作刚刚开始。本文将认知刺激对话生成视为一个多任务融合的逻辑思维推理过程,将情感分类任务、决策任务和对话回复生成任务间的逻辑关系,建模为一个推理过程,来引导大语言模型生成。针对决策任务,本文提出分层编码器结构的决策模型。决策实验结果表明,决策模型有效的提高了决策任务的准确率。针对多任务过程,本文提出多任务融合方法,将三个任务对应的模型结合在一起。生成实验结果表明,分类、决策及生成的多任务融合方法,显著提升了对话回复能力,证明了该方法的有效性和先进性。”

pdf bib
基于思维链的跨语言多文档摘要生成技术研究(Cross-lingual Multi-document Summarization Based on Chain-of-Thought)
Qi Tian (祁天) | Yang Jianan (杨建安) | Zhao Tiejun (赵铁军) | Yang Muyun (杨沐昀)

“随着全球化的加速发展,跨语言信息的高效传递与理解变得尤为重要。传统的多文档摘要生成技术可以提升信息获取效率,然而往往忽视了跨语言场景下的特殊挑战。为了缓解这一问题,本文提出了跨语言多文档摘要生成任务。我们首先构建了一个全面的跨语言多文档摘要测试集作为评估基准,其次提出了一种基于思维链技术的跨语言多文档摘要生成方法,并对其进行了实验验证。在实验中,我们使用了几种典型的大语言模型,并通过人工评估和自动评估来验证我们的方法。结果表明,我们提出的基于思维链的方法在跨语言多文档摘要生成任务上取得了显著的性能提升,为解决语言障碍下的信息获取问题提供了有效的解决方案。”

pdf bib
面向语言学习者的跨语言反馈评语生成方法(Cross-Lingual Feedback Comment Generation for Language Learners)
An Jiyuan (安纪元) | Zhu Lin (朱琳) | Yang Erhong (尔弘 杨)

“反馈评语生成任务旨在为语言学习者的产出提供纠偏及解释性的评价,促进学习者写作能力的发展。现有研究主要聚焦于单语的反馈评语生成,如为英语学习者提供英文反馈评语,但这忽略了非母语学习者可能面临的理解障碍问题,尤其当评语中存在陌生的语言知识时。因此,本文提出跨语言反馈评语生成任务(CLFCG),目的是为语言学习者生成母语的反馈评语。本研究构建了首个英甭中跨语言反馈评语生成数据集,该数据集包含英语学习者产出的语句与相应的中文反馈评语,并探索了基于流水线的预训练语言模型引导增强生成方法,将修正编辑、线索词语和语法术语等作为输入的附加信息,引导和提示生成模型。实验结果表明,附加引导信息的预训练语言模型流水线方法在自动评估(BLEU:50.32)与人工评估(Precision:62.84)上表现良好。本文对实验结果进行了深入分析,以期为跨语言反馈评语生成任务提供更多见解。”

pdf bib
文本样式和主题框架引导下的大模型辅助儿童新闻生成(Text Styles and Thematic Framework Guided Large Modeling to Aid Children’s News Generation)
Du Xiaomeng (杜晓蒙) | Yu Dong (于东) | Liu Pengyuan (刘鹏远)

“主流新闻内容多针对成年人设计,不易于儿童理解,难以满足其阅读需求。对此,我们提出了一种基于主题的儿童新闻篇章结构框架(TNC-LLM)。该框架融合了文本样式定义(TSD)和主题类别定义(TCD)两大核心模块,TSD模块采用多种机器学习算法,从不同粒度分析文本样式风格和段落布局等特点,TCD模块针对不同主题进行了内容分析,以揭示儿童新闻的写作特点和内容的倾向性,确保内容的教育性和适宜性。本文实验主要评估了ChatGPT3.5等四个模型在将成年人新闻转换为面向儿童的新闻的性能。实验结果表明,TNC-LLM在儿童新闻内容生成任务中对内容的准确性、文本的趣味性以及教育性等关键维度有显著提升。此外,该框架具有普适性,能够应用于不同类型的大型语言模型。”

pdf bib
基于对比学习和排名一致性的古代汉语翻译质量评估模型(Ancient Chinese translation quality evaluation model based on contrastive learning and ranking consistency)
Li Huaiming (李怀明) | Shao Yanqiu (邵艳秋) | Li Wei (李炜)

“当前,虽然机器翻译的自动评估技术已展现出良好的性能,但将它们应用于古代汉语到现代汉语的翻译场景时效果并不理想。一方面,这些传统方法能较好地比较质量差异较大的译文的好坏,但是在评估质量相差不大的译文时往往难以区分优劣。另一方面,古代汉语的省略和复杂句式常导致翻译过程中出现漏译现象,而传统评估指标往往会给这类较差的译文偏高的分数。在本文中,我们提出了一种基于对比学习和排名一致性的古代汉语到现代汉语的翻译质量评估模型(CRATE)。该模型通过确保语义相似度和匹配度的排名一致性捕捉译文质量的细粒度排名信息。另外,我们在使用对比学习方法训练译文跟原文的匹配模型时,将原文自身作为负样本,有效解决了传统评估指标在译文出现漏译情况下仍给出高评分的问题。为了证明我们模型的有效性,我们构建了高质量的古代汉语到现代汉语翻译的人工评分测试集。实验结果表明,我们的模型优于强大的基线,与人类评分取得了更显著的相关性。”

pdf bib
基于两种新颖辅助任务的端到端语音翻译(End-to-End Speech Translation Enhanced by Two Novel Auxiliary Tasks)
Dou Huaixia (窦怀厦) | Lvu Mengzhe (吕孟哲) | Li Junhui (李军辉)

“端到端语音翻译具有跨模态和跨语言的特性,如何有效地利用这些特性是一个具有挑战性的问题。本文基于多任务学习框架,提出两种新颖辅助任务。语音增强的文本翻译任务通过在文本翻译任务中融入语音模态信息来缓解语音和文本的模态差异,最终提升语音翻译任务的性能。全局感知条件掩码语言建模任务能够同时建模转录文本和译文进而利用文本的全局上下文信息指导翻译模型的训练。在MuST-C数据集8个语向的实验结果表明,本文的方法显著优于基线系统,并且达到了与其它端到端语音翻译系统可竞争的性能水平。进一步的分析实验表明,本文的方法能够缓解语音和文本之间的模态差异并且在不损害文本翻译任务性能的情况下提升语音翻译任务的性能。”

pdf bib
基于隐性句逗号识别的汉语长句机器翻译(Machine translation of Chinese long sentences based on recognition of implicit period and comma)
Zhang Wenjuan (张文娟) | Li Manjia (李熳佳) | Feng Wenhe (冯文贺)

“长句翻译一直是机器翻译的难题。本文根据汉语中相当数量的逗号(句内标点)和句号(句间标点)可相互转化的特点,提出”隐性句号”(可转化为句号的逗号)和”隐性逗号”(可转化为逗号的句号)概念,并实现其自动识别,以将汉语长句变为短句用于汉英机器翻译。为此,首先通过人工与半监督学习结合方法构建了一个隐性句逗数据集,实现了基于预训练模型的隐性句逗识别方法,其中性能最好的HierarchicalBERT作为后续应用模型。进而,实现了基于隐性句逗识别的汉英机器翻译方法。在WMT2018(新闻)和WMT2023(文学)测试语料上基于预训练机器翻译模型的实验表明,对于汉语长句的英译,本文方法相比基准翻译的BLEU值整体有所提高,而且在相对稳健机器翻译模型上,呈现为句子越长本文方法效果越明显。”

pdf bib
基于知识蒸馏的低频词翻译优化策略(Knowledge Distillation-Based Optimization Strategy for Low-Frequency Word Translation in Neural Machine)
Guo Yifan (郭逸帆) | Zan Hongying (昝红英) | Yan Ziyue (阎子悦) | Xu Hongfei (许鸿飞)

“神经机器翻译通常需要大量的平行语料库才能达到良好的翻译效果。而在不同的平行语料库中,均存在词频分布不平衡的问题,这可能导致模型在学习过程中表现出不同的偏差。这些模型倾向于学习高频词汇,而忽略了低频词汇所携带的关键语义信息。忽略的这些低频词汇也包含重要的翻译信息,可能会对翻译质量产生不利影响。目前的方法通常是训练一个双语模型,然后根据频率为词汇分配不同的权重,通过增加低频词的权重来提高低频词的翻译效果。在本文中,我们的目标是提高那些有意义但频率相对较低的词汇的翻译效果。本文提出使用知识蒸馏的方法来提高低频词的翻译效果,训练在低频词上翻译效果更好的模型,将其作为教师模型指导学生模型学习低频词翻译。进而提出一个更加稳定的双教师蒸馏模型,进一步保证高频的性能,使得模型在多个任务上均获得了稳定的提升。本文的单教师蒸馏模型在英语→ 德语任务上相较于SOTA进一步取得了0.64的BLEU提升,双教师蒸馏模型在汉语→ 英语任务上相较于SOTA进一步取得了0.31的BLEU提升,在英语→ 德语、英语→ 捷克语和英语→法语的翻译任务上相较于基线低频词翻译效果,在保证高频词翻译效果不变化的前提下,分别取得了1.24、0.47、0.87的BLEU提升。”

pdf bib
融合确定性因子及区域密度的k-最近邻机器翻译方法(A k-Nearest-Neighbor Machine Translation Method Combining Certainty Factor and Region Density)
Qi Rui (齐睿) | Shi Xiangyu (石响宇) | Man Zhibo (满志博) | Xu Jinan (徐金安) | Chen Yufeng (陈钰枫)

“k-最近邻机器翻译(kNN-MT)是近年来神经机器翻译领域的一个重要研究方向。此类方法可以在不更新机器翻译模型的情况下提高翻译质量,但训练数据中高低频单词的数量不均衡限制了模型效果,且固定的k值无法对处于不同密度分布的数据都产生良好的翻译结果。为此本文提出了一种创新的kNN-MT方法,引入确定性因子(CF)来降低数据不均衡对模型效果的影响,并根据测试点周边数据密度动态选择k值。在多领域德-英翻译数据集上,相比基线实验,本方法在四个领域上翻译效果均有提升,其中三个领域上提升超过1个BLEU,有效提高了神经机器翻译模型的翻译质量。”

pdf bib
Ko-LLaMA:基于LLaMA的朝鲜语大语言模型(Ko-LLaMA: A Korean Large Language Model Based on LLaMA)
Pang Jie (庞杰) | Yan Xiaodong (闫晓东) | Zhao Xiaobing (赵小兵)

“大语言模型在这两年受到了非常广泛的关注,像ChatGPT和GPT-4这样的大型语言模型(LLMs)极大地改变了自然语言处理研究,并在通向人工通用智能(AGI)的道路上迈出了令人兴奋的步伐。尽管已经开源了LLaMA等几个大型语言模型,但这些模型主要关注英文和中文语料库,对其他语言的适用性有限。而对于少数民族语言如朝鲜语来说,大语言模型的适用性更加有限。在本文中,我们通过扩展LLaMA现有的词表,增加了额外的20000个朝鲜语Token,从而提高了其对朝鲜语的编码和语义理解的能力;并且进一步使用朝鲜语数据进行继续预训练,使用朝鲜语指令微调数据集对模型进行SFT(Supervised Fine-Tuning),并分析了不同数据量对指令精调效果的影响,经过继续预训练和指令微调后的模型显著提高了理解和遵循朝鲜语指令的能力。通过上述训练,极大增强了LLaMA的理解和生成朝鲜语文本的能力,并增强了其遵循指令的能力。实验结果表明,新提出的模型Ko-LLaMA显著提高了原版LLaMA在理解和生成朝鲜语内容方面的能力。此外,在鲜语文本分类数据集YNAT上对Ko-LLaMA与擅长少数民族语言的CINO模型及CINO的多种模型组合以及原版LLaMA和GPT3.5进行了效果对比。结果表明,Ko-LLaMA的朝鲜语文本分类能力远超CINO和CINO的组合模型以及LLaMA和GPT3.5等未经过朝鲜语语料进行词表扩充和继续预训练的大语言模型。”

pdf bib
TiComR:基于提示的藏文对话型阅读理解模型(TiComR: A Prompt-based Tibetan Conversational Reading Comprehension Model)
Pengmao Cairang (朋毛才让) | Sun Yuan (孙媛)

“现有的对话型阅读模型在中英文对话型阅读理解任务中表现出色,但由于藏文在语法结构、表达方式等方面同中英文有显著差异,导致这些模型在对藏文对话型阅读理解的对话历史进行建模时存在困难。鉴于此,本文利用当前大模型的优越能力,提出了一种基于提示的对话历史建模方法-TicomR,以解决藏文对话型阅读理解任务中模型性能受限的问题。该方法通过引入基于提示的学习机制,直接在段落文本中添加提示来突显对话历史,而非修改段落标记嵌入,从而在微调过程中实现对对话历史的精确建模,以增强模型对问题的理解能力。实验结果表明,TiComR模型在藏文对话型阅读理解任务上取得了显著的性能提升,并在英文数据集CoQA上也有较好的表现。本文将TicomR开放供研究使用,http://github.com/Tshor/TicomR。”

pdf bib
TiLamb:基于增量预训练的藏文大语言模型(TiLamb: A Tibetan Large Language Model Based on Incremental Pre-training)
Zhuang Wenhao (庄文浩) | Sun Yuan (孙媛) | Zhao Xiaobing (赵小兵)

“基于“预训练+微调”范式的语言模型展现了卓越的性能,随着模型规模和训练数据量的扩增,其解决多种自然语言处理任务的能力得到了显著的提高。当前的大语言模型主要支持英汉等主流语言,这限制了藏语等低资源语言在该领域的研究。针对藏语数据稀缺、现有藏语预训练模型效果不够好、下游任务可扩展性差等问题,本文汇总清洗得到26.43GB藏文数据,以开源的LLaMA2-7B作为基座模型,扩充LLaMA2现有词表,增加了约30,000个藏文tokens,提高其藏文编码效率和对藏文的语义理解能力,通过增量预训练得到藏文大语言模型基座TiLamb。根据多种藏文下游任务分别制作数千到几万条不等的微调数据集,微调后的TiLamb在藏文新闻分类、藏文实体关系分类、藏文机器阅读理解、藏文分词、藏文摘要、藏文问题回答、藏文问题生成共七个下游任务中进行验证,多项指标结果相较传统方法和其他藏文预训练模型有大幅提升。本文将TiLamb和部分资源开放供研究使用,https://github.com/NLP-Learning/TiLamb。”

pdf bib
基于蒙古文文本语义辅助的噪声鲁棒蒙古语语音情感识别方法研究(Research on Noise-Robust Mongolian Speech Emotion Recognition Methods Based on Mongolian Text Semantics)
Liu Huan (刘欢) | Liang Kailin (梁凯麟) | Zuo Haolin (左昊麟) | Liu Rui (刘瑞)

“噪声环境下语音情感识别(Speech Emotion Recognition,SER)旨在从带有背景噪声的语音信号中挖掘情感特征并自动预测说话人的情感状态。尽管这项技术在英语、汉语等语言方面取得了迅速的进展,但对于像蒙古语这样的小语种,在噪声环境下的语音情感识别研究仍处于起步阶段,缺乏相关数据集和方法的研究。为了推动蒙古语语音情感识别的发展,本研究首先构建了一个单说话人语音情感识别数据集。之后为了实现噪声环境下准确的蒙古语语音情感识别,我们提出了一种基于文本-语音双模态的带噪蒙古语语音情感识别基线模型 MonSER。文本信息为噪声语音信号提供额外的语义信息。具体来说,我们的模型首先对带噪语音信号进行频谱特征提取,之后使用多语种预训练模型 XLMBert 对语音信号对应的蒙古文文本信息进行编码。随后将上述提取的双模态信息进行融合,并输入分类器进行情感类别的预测。我们利用该数据集进行模型训练并测试模型的有效性。实验结果表明,我们的双模态模型在多种噪声环境下的蒙古语语音情感识别准确率明显优于只以语音为输入的单模态语音情感识别系统。同时,为了模拟实际场景中文本可能缺失的情况,我们提出了两种文本 mask 策略,该文本实验也进一步验证了文本语音双模态的有效性。”

pdf bib
基于神经编解码语言模型的老挝语韵律建模方法(A Method for Lao Prosody Modeling Based on Neural Codec Language Model)
Yi Ningjing (易宁静) | Wang Linqin (王琳钦) | Gao Shengxiang (高盛祥) | Yu Zhengtao (余正涛)

“为了赋予合成语音类似人类语言的丰富韵律和节奏变化,现有方法普遍采用基于随机数的时长预测器。这些方法通过使用随机数初始化的潜在变量来模拟人类说话的多样节奏变化。然而,由于依赖于随机数噪声的局限性,这些方法合成的语音往往仍然缺乏真实语音的多样性和韵律变化的丰富性。与之前方法不同,本文提出了一种基于神经编解码语言模型(VALL-E)的韵律建模方法,本文利用先验速度和音调时序变化曲线建模韵律变化分布,有效融入神经编解码语言模型训练过程中,并且在推理阶段可通过控制先验时序曲线控制生成语音的韵律。实验证明,本文方法合成英语音频达到了4.05的MOS评分,合成老挝语音频达到了3.61的MOS评分。基于神经编解码语言模型的老挝语韵律建模方法,能很好的在速度和音调方面实现韵律的可控性。”

pdf bib
基于通用依存句法的锡伯语句法树库构建研究(A Dependency Treebank for Xibe based on Universal Dependencies)
Zhou He (周贺)

“我国是一个多民族、多语种的国家,拥有丰富的民族语言资源。然而,使用人口较少、文化影响力较小的语言普遍面临语言濒危的问题,记录和保存这些语言在语言学、民族学与人类学上都具有重要意义。在本研究中,我们以我国仍在活跃使用的满通古斯语——锡伯语为目标语言,从锡伯语语法书、锡伯语报纸《察布查尔报》以及锡伯语《语文》教材中收集了 1200个句子,以此为语料构建了一个包含词汇、形态以及依存句法信息的树库。本文详细描述了树库的构建过程,深入讨论了标注过程中遇到的难以解决的语言现象,并提出了我们的标注策略。通过标注,我们发现,随着汉语和锡伯语的深层接触,锡伯语不仅在词汇上接受了大量的汉语借词,锡伯语句子结构也受到一定程度的影响。基于所标注的锡伯语树库,我们进行了锡伯语自动句法分析实验,探讨了词、词性、字符特征以及中国少数民族语言预训练模型 CINO对句法分析性能产生的影响。”

pdf bib
基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding)
Cai Yuqing (蔡郁青) | Wang Chao (王超) | Renzeng Duojie (仁增多杰) | Zhu Yulei (朱宇雷) | Zhang Jin (张瑾) | Nyima Tashi (尼玛扎西)

“针对藏语端到端语音识别研究中存在的建模单元不统一和识别效果不理想的问题,本文提出了一种BPE-Conformer-CTC/Attention端到端藏语语音识别方法。首先,该方法采用了字节对编码算法进行语音建模,通过反复合并出现频率最高的字符对,将文本分割成易于管理、有意义的单元,平衡建模单元的粒度,从而解决藏语语音识别中建模单元不统一的问题。其 次 , 使 用 了Conformer编码器 , 有效地融合了音频序列的全局和局部依赖关系,从而增强了模型的表征能力。最后,通过CTC/Attention联合解码策略,加速了对齐和解码过程,进而提高了识别效果的准确性和效率。在开源数据集XBMU-AMDO31和TIBMD@MUCI上的实验结果表明,所提出的BPE-Conformer-CTC/Attention模型分别取得了9.0%和4.6%的词错误率,相较于基线模型Transformer-CTC/Attention,词错误率分别相对降低了14.2%和30.3%。该研究方法为藏语端到端语音识别任务提供了一种有效的解决方案。”

pdf bib
面向对话式阅读理解的高质量藏语数据集构建(Construction of high-quality Tibetan language dataset for conversational reading comprehension)
Dawa Cairen (达哇才仁) | Pengmao Cairang (朋毛才让) | Sun Yuan (孙媛)

“对话式阅读理解作为对话式人工智能领域的重要研究方向,旨在使机器能够理解自然语言文本,并能够进行多轮对话以解答与文本相关的问题。随着生成式大模型的发展,该任务也成为评测大模型性能的重要指标之一。在此过程中,高质量数据集的构建成为该领域的关键任务。目前,相关算法模型在许多英语数据集上取得了显著进展,甚至超过了人类表现。然而,对于低资源语言,尤其是缺乏相应数据集的藏语,对话式阅读理解研究尚处于起步阶段。本文采用了一种人工与半自动结合的方法策略,构建了藏语对话式阅读理解数据集TiconvQA(Tibetan Conversational QuestionAnswering)。该数据集共包含了20,358个对话对,涵盖了人物、地理和新闻三个领域。每一轮对话包括对话依据文本以及根据文本生成的多轮连续问答对。本文从对话数据的多样性、相关性、语言现象等方面对TiconvQA进行了详尽的分析与质量评估。并对藏文对话式阅读理解任务中存在影响评价指标的五种因素进行了优化。最终,我们采用了三种经典的对话式阅读理解模型以及藏文大模型TiLamb对数据集进行实验评估,实验结果验证了数据集的质量,并表明TiconvQA可用于模型在对话式阅读理解任务中的性能评测。”

pdf bib
面向心理健康咨询的藏语数据集及大语言模型构建(Construction of Tibetan Datasets and Large Language Models for Psychological Health Counseling)
Zhu Mengxiao (朱孟笑) | Sha Jiu (沙九) | Feng Chong (冯冲)

“焦虑、抑郁已成为人们常见的心理障碍,适度的疏导对于缓解人们精神、心理压力具有重要意义。然而由于病耻感等原因,很多人得不到及时的疏导和治疗。随着人工智能的发展,大语言模型(LLMs)优越的知识融会贯通能力和思维链能力,使得其成为心理疏导的有效工具。然而,现有少量面向心理健康咨询的大语言模型通常针对英文、中文等资源丰富的语种,而对于低资源语言,LLMs在心理咨询领域的应用尚缺少研究。本文以藏语作为低资源语言的代表,研究藏语心理咨询数据集的构建和藏语心理健康大语言模型的构建方法。首先,通过收集现有高质量的中文心理咨询对话数据,并对数据进行处理,生成心理健康多轮对话数据集;其次,构建汉藏翻译工具将其翻译成藏语多轮对话数据,并结合多种机制对数据进行筛选、过滤生成高质量藏语心理健康多轮对话数据;基于构造的数据,采用现有通用大语言模型Baichuan2和LLaMA2模型进行指令调优训练,形成藏语心理健康大语言模型,并将开源用于科学研究。最后通过实验验证了本文发布的藏语心理健康多轮对话数据集以及藏语心理健康咨询大语言模型的有效性。”

pdf bib
融合多元特征表示的藏文命名实体识别方法赵小兵∗2(Research on Tibetan Named Entity Recognition Using Multi-Feature Fusion Representation)
Ejian Cairang (俄见才让) | Zhou Maoke (周毛克) | Chen Bo (陈波) | Zhao Xiaobing (赵小兵)

“本文针对基于音节嵌入方式的藏文命名实体识别(TNER)中词汇信息和音节部件信息忽略的问题,提出了基于交叉Transformer架构的MECT-TL模型,融合了藏文音节信息、词汇信息和音节部件信息的多元数据特征。MECT-TL通过平面网络结构将藏文音节与词汇信息结合,并整合音节部件信息,有效提升了藏文实体识别的准确性。实验结果显示,相较于主流的TNER基准模型BiLSTM-CRF,本文模型在F1值上提高了5.14个百分点,与基于Transformer架构的TENER模型相比提高了4.18个百分点。这表明,融合藏文词汇和音节部件信息的方法可以显著提高TNER任务的性能。”

pdf bib
PGA-SciRE:基于大语言模型的数据增强框架进行科学领域的关系(PGA-SciRE:Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction)
Zhou Yang (周洋) | Dan Shimin (单世民) | Wei Hongkui (魏宏夔) | Zhao Zhehuan (赵哲焕) | Feng Wenshuo (冯文铄)

“关系提取旨在识别文本中提到的实体对之间的关系。大语言模型的进步对自然语言处理任务产生了巨大的影响。在这项工作中,我们针对科学领域的关系抽取任务,提出一个名为PGA的数据增强框架,用于提升模型在科学领域的关系抽取的性能。框架引入了两种数据增强的方式,利用大语言模型通过转述原训练集样本,得到句意相同但具备不同表述和形式的伪样本。以及指导大语言模型根据原训练集样本的关系和实体标签,生成暗含对应标签信息的句子。这两种伪样本分别与原数据集共同参与关系抽取模型的训练。实验中PGA框架提高了三个主流模型的科学领域内关系抽取的F1分数。同时,使用大语言模型获得样本也能有效减少人工标注数据的成本。”

pdf bib
UFSC:基于统一特征空间构建的零样本关系抽取(UFSC: A Unified Feature Space Construction for Zero-Shot Relation Extraction)
Liu Yuchen (刘雨辰) | Duan Jianyong (段建勇) | Sun Kang (孙康) | Zhang Qing (张晴) | He Li (何丽) | Wang Hao (王昊) | Liu Jie (刘杰)

“零样本关系抽取(ZSRE)旨在从可见关系中学习提取不可见关系的能力。一些研究表明:将样本语句与关系描述匹配进而预测不可见关系的方法,可以有效完成零样本关系抽取任务。然而,现有的匹配框架方法很少统一样本语句与关系描述的特征空间,缺乏对二者特征进行对齐。因此,本文提出一种为匹配框架零样本关系抽取而设计的统一特征空间构建方法。统一样本语句与关系描述的编码模块,并在此基础上引入特征相似损失。同时,为了减轻特征在空间上的聚合现象,引入特征均匀化模块,旨在构建特征更加均匀化的特征空间。本文所提出的方法实现了性能上的提升。与之前最佳的结果相比,在FewRel和Wiki-ZSL数据集上F1值平均提高1.6%和3.4%,体现了统一特征空间构建以及特征均匀化模块的有效性。”

pdf bib
多机制整合的中文医疗命名实体识别(Infusing multi-schemes for Chinese Medical Named Entity Recognition)
Wang Shanshan (王珊珊) | Zhang Kunyuan (张焜元) | Yan Rong (闫蓉)

“在互联网在线医疗领域,由于大多数患者缺乏医学培训,以及不同学科病理特征的复杂性,医患对话文本中的医学命名实体呈现出长且多词的句法特点,给命名实体识别算法提出了新的挑战。 为解决这一问题,本研究融合多个不同粒度的扩张卷积机制,构建了Flat-Lattice-CNN模型。 该模型不仅考虑字符和词语的语义信息以及它们的绝对和相对位置信息,还提取跨越不同距离的多个字符/词语的共现依存关系特征,以此提高医学长命名实体的识别精度。 实验结果表明,本文提出的模型在所评估数据集的命名实体识别任务上有普遍性的性能提升,尤其是在以长实体为主的中文医疗数据集CTDD上,该模型的F 1值提升了约2%,具有更优的表现。”

pdf bib
基于两阶段提示学习的少样本命名实体识别(Two-Stage Prompt Learning for Few-Shot Named Entity Recognition)
Shao Jiaxing (邵佳兴) | Huang Qi (黄琪) | Xiao Cong (肖聪) | Liu Jing (刘璟) | Luo Wenbing (罗文兵) | Wang Mingwen (王明文)

“少样本命名实体识别旨在用少量的标注数据来识别命名实体。近年来受提示学习在少样本场景中表现良好性能的启发,本文探索了基于提示的少样本命名实体识别的方法。已有的基于提示学习的方法是通过列举所有可能的跨度来进行实体识别,这导致了计算成本高以及对实体边界信息未充分利用的问题。本文提出一种基于提示学习的两阶段框架TSP-Few,在不使用源域数据的情况下,进行少样本命名实体识别。第一阶段对种子跨度进行增强、过滤和扩展,其中种子增强模块能够让种子跨度捕获到更丰富的语义信息,种子过滤器能够减少大量的无关跨度,种子扩展模块能够充分利用实体的边界信息,为实体类型分类提供高质量的候选实体跨度。第二阶段利用提示学习方法预测候选跨度的相应类别。此外,为了缓解跨度检测阶段的错误累积,在实体分类阶段引入了负采样策略。跨度检测和实体类型分类任务的独立训练更容易在少样本情况下取得优异的性能。在三个基准数据集上的实验表明,与先进的方法相比,本文提出的方法在性能上有了进一步的提升,并且实验结果也表明了该文模型各个模块的有效性。”

pdf bib
面向工艺文本的实体与关系最近邻联合抽取模型(Nearest Neighbor Joint Extraction Model for Entity and Relationship in Process Text)
Yang Danqingxin (杨丹清忻) | Wang Peiyan (王裴岩) | Xu Lijun (徐立军)

“该 文 研 究 工 艺 文 本 中 实 体 关 系 联 合 抽 取 问 题 , 提 出 了 最 近 邻 联 合 抽 取 模 型(NNJE)。NNJE利用工艺文本中实体边界字间搭配规律建模外显记忆,通过最近邻方法在某种指定关系下为待预测组合检索出具有相似字间搭配的实例,为实体边界识别以及实体对组合提供更有力的限制条件,提升模型预测准确率,改善模型性能。实验设置了工艺文本关系数据集。实验结果表明,该文方法较基线模型准确率P值提高了3.53%,F1值提升了1.03%,优于PURE、CasRel、PRGC与TPlinker等方法,表明提出的方法能够有效地提升三元组抽取效果。”

pdf bib
融合扩展语义和标签层次信息的文档级事件抽取(Document-Level Event Extraction with Integrating Extended Semantics and Label Hierarchy Information)
Fu Yujiao (符玉娇) | Liao Jian (廖健) | Li Yang (李旸) | Guo Zhangfeng (郭张峰) | Wang Suge (王素格)

“文档级事件抽取是自然语言处理中的一项重要任务,面临论元分散和多事件提及的挑战,现有研究通常从文档的所有句子中抽取论元,通过论元角色建模捕获实体间关系,忽略了文档中事件-句子间的关联差异性。本文提出了一种融合扩展语义和标签层次信息的文档级事件抽取方法。首先,利用大语言模型对文本和事件类型标签与论元角色标签进行语义扩展,以引入更丰富的背景语义信息;其次,基于关联差异性的事件类型检测模块,获取文档中与事件类型高度相关的句子,通过约束候选实体的抽取范围,来缓解论元分散问题;进一步,针对文档提及的多个事件类型,利用有向无环图从候选实体中抽取论元,获取所有事件要素。在ChFinAnn和DuEE-Fin两个数据集上的实验结果表明,本文提出的方法相比基线模型可以有针对性地缓解多个事件所属论元分散的问题,有效地提升事件抽取的性能。”

pdf bib
融合领域词汇扩充的低资源法律文书命名实体识别(Named Entity Recognition for Low-Resource Legal Documents Using Integrated Domain Vocabulary Expansion)
Paerhati Tulajiang (帕尔哈提吐拉江) | Sun Yuanyuan (孙嫒媛) | Cai Aichen (蔡艾辰) | Wang Yanhua (王艳华) | Lin Hongfei (林鸿飞)

“目前基于预训练语言模型的司法领域低资源法律文书命名实体识别研究主要面临两个问题:(1)在低资源语言中,如维吾尔语,法律文书相关的语料极其有限,这种语料资源稀缺限制了基于预训练语言模型的训练和性能。(2)法律文书中使用的专业术语不仅复杂且特定,新的法律术语和概念的出现使得现有的模型难以适应。针对上述问题,本文基于多语言预训练模型mBERT,通过领域词汇扩充及模型微调的方法,提升了模型在维吾尔语法律文书命名实体识别任务的性能。本文首先整理并构建维吾尔语司法领域专业词汇列表,并将其添加到mBERT模型的词汇表中。随后,在人工标注的维吾尔语法律文书命名实体数据集UgLaw-NERD上进行模型微调,验证了该方法的有效性。实验结果表明,相比于仅使用mBERT进行微调的基线模型,融合领域词汇扩充的模型在命名实体识别任务上F1得分提升至89.72%,较基线提高了7.39%。此外,本文还探讨了不同领域词汇扩充量对模型命名实体识别性能的影响,结果显示,领域词汇扩充增强了预训练模型在处理维吾尔语任务中的表现。这些结论为其他低资源语言在司法领域开展基于预训练模型的自然语言处理研究提供了有益的参考。”

pdf bib
基于动态提示学习和依存关系的生成式结构化情感分析模型(Dynamic Prompt Learning and Dependency Relation based Generative Structured Sentiment Analysis Model)
Jia Yintao (贾银涛) | Cui Jiajia (崔佳佳) | Mu Lingling (穆玲玲) | Zan Hongying (昝红英)

“结构化情感分析旨在从文本中抽取所有由情感持有者、目标事物、观点表示和情感极性构成的情感元组,是较为全面的细粒度情感分析任务。针对目前结构化情感分析方法错误传递,提示模版适应性不足和情感要素构成复杂的问题,本文提出了基于动态提示学习和依存关系的生成式结构化情感分析模型,根据不同的情感元组构成情况分别设计提示模版,并用模板增强生成式预训练模型的输入,用依存关系增强生成效果。实验结果显示,本文提出的模型在SemEval20221数据集上的SF1值优于所对比的基线模型。”

pdf bib
基于方面引导的图文渐进融合的多模态方面级情感分析方法(Aspect-Guided Progressive Fusion of Text and Image for Multimodal Aspect-Based Sentiment Analysis)
Yan Zida (闫自达) | Guo Junjun (郭军军) | Yu Zhengtao (余正涛)

“多模态方面级情感分析旨在通过结合图像信息和文本信息来识别特定方面的情感极性。然而,图像和文本作为两种不同的模态,其在数据表现形式和语义表达上存在显著差异,缩小模态鸿沟和跨模态特征融合是多模态方面级情感分析任务中出现的两个关键问题。对此,本文提出了一种基于方面引导的图文渐进融合的多模态方面级情感分析方法,该方法采用图像和文本中重叠的方面信息作为枢轴,利用方面引导的图文对比学习和基于对比的跨模态语义交互来缩小模态差异、促进语义交互,然后在多模态特征空间中整合视觉和文本信息,通过方面引导的基于对比的多模态语义融合来促进跨模态特征融合,从而提升多模态情感分析的性能。在三个多模态方面级情感分析基准数据集上的实验结果证明了本文提出方法的有效性,并且优于其他大多数最先进的多模态方面级情感分析方法。”

pdf bib
基于联邦知识蒸馏的跨语言社交媒体事件检测(Cross-Lingual Social Event Detection Based on Federated Knowledge Distillation)
Zhou Shuaishuai (周帅帅) | Zhu Enchang (朱恩昌) | Gao Shengxiang (高盛祥) | Yu Zhengtao (余正涛) | Xian Yantuan (线岩团) | Zhao Zixiao (赵子霄) | Chen Lin (陈霖)

“社交媒体事件检测是指在从各类社交媒体的内容中挖掘热点事件。在实际情况中,由于数据稀缺,社交媒体事件检测在低资源的情况下表现较差。现有的方法主要通过跨语言知识迁移等方式来缓解低资源问题,但忽略了数据隐私问题。因此,本文提出了基于联邦知识蒸馏的跨语言社交媒体事件检测框架(FedEvent),旨在将富资源客户端知识蒸馏到低资源客户端。该框架通过结合参数高效微调技术和三组对比损失,实现非英文语义空间到英文语义空间的有效映射,并采用联邦蒸馏策略,保障数据隐私的前提下实现知识的迁移。此外,我们还设计了一套四阶段生命周期机制以适应增量场景。最后,我们在真实数据集上进行实验以证明该框架的有效性。”

pdf bib
基于生成式语言模型的立场检测探究(Research on Stance Detection with Generative Language Model)
Zhang Yuanshuo (张袁硕) | Li Aohua (李澳华) | Yin Zhaoning (尹召宁) | Wang Panyi (王潘怡) | Chen Bo (陈波) | Zhao Xiaobing (赵小兵)

“近年来,立场检测任务受到越来越多的关注,但相关标注数据在范围和规模上都有限,不能有效支撑基于神经网络的立场检测。为此,本文探索在零样本阯少样本场景下生成式语言模型在立场检测任务上的能力。首先,构建了一个全新的面向立场检测的数据集,包含5个主题,共2500个人工标注样例;然后,在此数据集上进行了一系列探索实验,实验结果表明:生成式语言模型在零样本设定下,采用结构化的提示学习表现良好;增加额外信息能够显著提升模型性能;在少样本设定下,提供相同目标的示例能够明显提升模型性能,而不同目标示例产生了负面作用;使用思维链可以显著提升模型性能;受提示学习的启发,微调预训练语言模型进一步论证提供额外信息对立场检测的增益显著。”

pdf bib
基于双图注意力网络的篇章级散文情绪变化分析方法(A Document-Level Emotion Change Analysis Method Based on DualGATs for Prose)
Li Ailin (李爱琳) | Li Yang (李旸) | Wang Suge (王素格) | Li Shuqi (李书琪)

“在散文中,作者的情绪会伴随着文章的段落或者句子发生变化,比如从悲伤到快乐、从喜悦到愤怒。为此,本文构建散文情绪变化数据集,提出一种基于双图注意力网络的多种知识融合的情绪变化分析方法。首先,引入意象知识库,建立融合意象知识的句子表示;其次,构建上下文带权依赖图和语篇带权依赖图,通过融合上下文知识和语篇结构,建立了融合上下文知识、语篇结构的句子表示;同时设计愉悦效价识别层,获得融合愉悦效价信息的句子表示;在此基础上,将以上三者表示进行拼接,通过全连接网络得到最终的情绪变化结果。实验结果表明,本文提出的方法可以有效识别情绪变化,为散文阅读理解中的思想情绪变化类问题的解答提供帮助。”

pdf bib
基于主题模型与图神经网络的突发公共卫生事件国际舆情演化分析研(International Public Opinion Evolution Analysis on Sudden Public Health Events using Topic Model and Graph Neural Network)
Gao Jingjian (高境健) | Sang Guoming (桑国明) | Liu Zhi (刘智) | Zhang Yijia (张益嘉) | Lin Hongfei (林鸿飞)

“研究突发公共卫生事件国际舆情演变规律,对国际舆情资源进行应急管理和舆论疏导有重要借鉴价值。本文使用谷歌新闻数据库以各国针对COVID-19的报道为对象,构建国际舆情数据集。采用主题模型、图神经网络模型,结合时间、空间维度与舆情生命周期探究全球舆论主题-情感的演化态势,模型准确率为0.7973,F1值为0.7826,性能优于其他基线模型。研究发现,各国舆情呈现放射传播状态。国际媒体舆论的情感倾向和讨论主题存在正相关且随时间进行转变。”

pdf bib
面向社交媒体多特征增强的药物不良反应检测(Adverse drug reaction detection with multi-feature enhancement for social media)
Li Hao (李浩) | Qiu Yunzhi (邱云志) | Lin Hongfei (林鸿飞)

“社交媒体是药物不良反应(ADR)检测的重要途径之一。本文提出一个基于社交媒体的药物不良反应检测模型DMFE,以全面捕捉患者对药物使用的反馈信息。与传统的文本检测相比,社交媒体数据中通常会有语法不规范与单词拼写错误的问题。本文提取出社交媒体数据的抽象语义表示(AMR)使用图注意力网络(GAT)学习抽象语义特征提高模型对语义信息的理解,使用字符级卷积神经网络(charCNN)捕获字符特征以减少单词拼写错误带来的影响。此外,本文使用提示学习的方法融入荍荥荤荄荒荁药物不良反应领域关键词进一步增强模型对领域知识的理解能力。经实验评估,本文模型DMFE在CADEC、TwiMed两个数据集上F1值与基线模型相比取得最优效果。”

pdf bib
面向中文文本的情绪持有者抽取研究(Research on Emotion Holder Extraction for Chinese Texts Yawei Sun1,,,Yu Shi1,,,Xu Han2,∗)
Sun Yawei (孙亚伟) | Shi Yu (石宇) | Han Xu (韩旭)

“情绪持有者是文本中带有情绪的主体,对这些情绪持有者的分析对文本情绪理解至关重要。然而,现有研究未充分考虑情绪持有者的共指现象,且由于缺乏面向中文语料的情绪持有者抽取数据,这一研究的发展受到了进一步的限制。本文构建了一个针对中文文本的情绪持有者抽取数据集,有效解决了数据中的共指问题。同时,提出了一种融合语义、情绪和词性特征的模型,实现了高效的情绪持有者抽取与共指消解,且在各项性能指标上超越了基线模型。消融实验进一步证明了模型设计的有效性。1”

pdf bib
AutoRG:一种大小模型协同的自动报告生成框架(AutoRG: An automatic report generation framework for Large and small model collaboration)
Zhang Jing (张京) | Shu Jiangming (舒江明) | Zhang Yuxiang (张宇翔) | Wu Bin (吴斌) | Wang Wei (王巍) | Yu Jian (于剑) | Sang Jitao (桑基韬)

“自动报告生成技术在提高工作效率和节约人力资源方面具有显著潜力。大语言模型的出现使得报告流畅度与可解释性得到提升。然而,现有工作仍依赖人工,缺乏灵活性和丰富度。同时,小模型错误或冗余的输出与大模型自身的随机性会导致报告质量不稳定。本文提出大小模型协同的自动报告生成框架AutoRG,通过大模型的工具理解与规划能力减少人工干预,提升报告丰富度,并通过信息修正与报告迭代机制提高报告的稳定性。本文以自动专利报告生成为场景,从多个维度对AutoRG进行全面测试。结果表明,该框架在提高报告生成的丰富度和质量稳定性方面具有显著优势。”

pdf bib
基于本体信息增强的人类表型概念识别(Ontology Information-augmented Human Phenotype Concept Recognition)
Qi Jiewei (祁杰蔚) | Luo Ling (罗凌) | Yang Zhihao (杨志豪) | Wang Jian (王健) | Lin Hongfei (林鸿飞)

“从文本中自动识别人类表型概念对疾病分析具有重大意义。现存本体驱动的表型概念识别方法主要利用本体中概念名和同义词信息,并未充分考虑本体丰富信息。针对此问题,本文提出一种基于本体信息增强的人类表型概念识别方法,利用先进大语言模型进行数据增强,并设计本体向量增强的深度学习模型来提升概念识别性能。在GSC+和ID-68两个数据集上进行实验,结果表明本文提出方法能够利用本体丰富信息有效提升基线模型性能,取得了先进结果。”

pdf bib
基于机器学习的语音情感声学特征筛选(Acoustic Feature Selection for Speech Emotion Based on Machine Learning)
Dong Wenqi (董文琪) | Wang Han (王涵) | Zhang Jingwei (张璟玮)

“筛选有效表达情感的声学特征对语音情感研究至关重要。对具有相同或相似声学特征的情感,声学研究中仅使用基频和时长无法有效区分。本研究扩大声学参数的种类和数量,使用三种机器学习方法,筛选出区分情感类型的多组有效声学参数,补充和完善语音情感声学研究的声学特征集。研究发现,区分不同情感所依赖的声学参数、参数数量、参数贡献都不相同,其中频谱和信噪参数发挥重要作用。本研究为语音情感声学分析的参数选择提供参考。”

pdf bib
基于交互行为语义模式增强的ID推荐方法(Enhanced ID Recommendation Method Utilizing Semantic Patterns of Interactive Behaviors)
Wang Yuanlai (王远来) | Bai Yu (白宇) | Lian Peng (廉鹏)

“基于ID的推荐是一种依赖用户或物品的唯一标识符进行推荐的经典推荐方法,这种方法经常面临用户物品交互数据稀疏、符号ID缺失语义信息等问题。该文针对上述问题,假设不同领域的用户-物品交互行为之间存在潜在的模式关联,提出了一种基于交互行为语义模式增强的ID推荐方法。该方法在目标域推荐任务中引入辅助域信息,基于图神经网络对辅助域和目标域信息进行联合编码表示,通过引入交互行为语义模式,将辅助域的用户-物品交互信息以及物品描述信息迁移至目标域,从而实现目标域ID推荐中的交互行为语义增强。在8个公开数据集上的实验结果表明,相比目前的SOTA模型,本文方法表现出更好的推荐效果,其Recall@20与NDCG@20分别具有3% ∼ 30%、1% ∼ 40%的提升。”

pdf bib
基于双层语义映射的大语言模型辅助古汉语事件抽取半自动标注框架(A Semi-automatic Annotation Framework for Event Extraction in Classical Chinese Assisted by Large Language Models Based)
Wei Congcong (卫聪聪) | Li Wei (李炜) | Feng Zhenbing (冯振冰) | Shao Yanqiu (邵艳秋)

“尽管自然语言处理技术(歎歌歐)在现代语言事件抽取任务(歅歅)上已有较为成熟的解决方案,但针对古汉语事件抽取的研究却受限于标注数据匮乏和文本语义复杂等挑战。因而我们提出使用当前取得巨大成功的大语言模型(歌歌歍歳)来辅助人类标注员进行数据标注。为了应对歌歌歍歳在古汉语上存在的训练不足、语义理解能力欠缺的问题,我们提出了一种基于双层语义映射的歌歌歍歳辅助古汉语事件抽取半自动标注框架,利用古汉语的现代汉语译文,结合事件语义学理论及语义依存分析技术,为歌歌歍歳提供丰富的语义信息表示,从而进一步将语义依存关系逐步映射为具体的事件信息。经过人类标注员的审核反馈,有效克服了现有歎歌歐工具和歌歌歍歳在古汉语事件抽取标注时的局限。实验结果表明,我们的方法不仅提高了古汉语事件抽取标注的准确性和效率,而且减少了对专业人员的依赖和人工标注工作量,为低资源语言标注实践提供了新的方法论,探索了大模型时代数据标注的新方向。”

pdf bib
基于文本风格迁移的中文性别歧视文本去毒研究(Research on detoxification of Chinese sexist texts based on text style transfer)
Peng Jian (彭健) | Zuo Jiali (左家莉) | Tan Jingxuan (谭景璇) | Wan Jianyi (万剑怡) | Wang Mingwen (王明文)

“网络社交媒体平台存在一定程度的性别歧视言论,阻碍了互联网健康和社会文明发展。文本风格迁移技术可以减轻文本中的性别歧视,在英语等语言上已有不少研究。但在中文领域,由于缺乏数据集而导致相关研究较少。此外,由于中文语义信息丰富、语言表达多样而导致性别歧视言论毒性的表现形式多样,现有的方法多采用单一文本风格迁移模型因而效果不佳。因此,本文提出了一个基于文本风格迁移的中文性别歧视文本去毒框架,该框架首先根据毒性的表现形式对文本进行分类,进而根据文本毒性表现形式的不同采用不同的处理方式,我们还引入了大语言模型(LLM)构建歧视词词典。实验表明,本文提出的模型能有效地处理中文文本中的性别歧视问题。”

pdf bib
基于问题扩展的散文答案候选句抽取方法研究(Sentiment classification method based on multitasking and multimodal interactive learning)
Lei Yang (雷洋) | Wang Suge (王素格) | Li Shuqi (李书琪) | Wang Hao (王浩)

“在散文阅读理解中,一方面问题的题干通常较为简洁、用词较为抽象,机器难以直接理解问题的含义和要求;另一方面,散文文章较长,答案候选句分散在文章的多个段落,给答案候选句的抽取任务带来巨大的挑战。因此,本文提出了一种基于问题扩展的散文答案候选句抽取方法。首先,利用大语言模型抽取文章中与问题题干相关的词,构建问题词扩展库,其次,利用大语言模型强大的生成能力对原问题的题干进行重写,进一步,利用问题词扩展库对其扩展,最后,通过对散文文章分块处理,建立基于全局上下文信息、历史信息的问题和文章句子的相关性判断模型,用于抽取答案候选句。通过在散文阅读理解数据集上进行实验,实验结果表明本文提出的方法提高了散文抽取答案候选句的准确率,为散文阅读理解的生成类问题的解答提供了技术支撑。”

pdf bib
基于预训练模型与序列建模的音素分割方法(Sequence Modeling)
Yang Shanglong (杨尚龙) | Yu Zhengtao (余正涛) | Wang Wenjun (王文君) | Dong Ling (董凌) | Gao Shengxiang (高盛祥)

“音素分割作为语音处理领域内的一个重要任务,对于关键词识别、自动语音识别等应用具有至关重要的意义。传统方法往往独立预测每一帧音频是否为音素边界,忽视了音素边界与整个音频序列以及相邻帧之间的内在联系,从而影响了分割的准确性和连贯性。本文提出一种基于预训练模型与序列建模的音素分割方法,在HuBERT模型提取声学特征的基础上,结合BiLSTM捕捉长期依赖,再用CRF优化序列,提升了音素边界检测的性能。在TIMIT和Buckeye数据集上的实验表明,本文方法优于现有技术,证明了序列建模在音素分割任务中的有效性。”

pdf bib
近三十年域外汉籍研究的现状与展望—基于文献计量分析和知识图谱绘制(Extraterritorial Chinese Texts in the Last Thirty Years: Research Advances and Future Perspectives Based on)
Tang Rongjun (唐榕骏) | Peng Zhifeng (彭志峰)

“域外汉籍研究对中华文化传承传播意义重大、成果丰富,但缺少文献计量分析。本研究的目的是通过繃繎繋繉优质期刊论文,分析中国近縳縰年域外汉籍研究的演进趋势。通过知识图谱、普赖斯理论、繌繥繹繤繥繳繤繯繲縋引文理论、斯皮尔曼等级相关系数与关键词共现、聚类、突现、时线分析研究学科演进趋势。研究发现,繜汉籍整理与数字化研究縢繜文化交流与文化传播縢为前沿热点,揭示出领域内研究工作已从回归普查转向应用转化。本研究为学科发展提供了参考数据。”

pdf bib
面向中文多方对话的机器阅读理解研究(Research on Machine Reading Comprehension for Chinese Multi-party Dialogues)
Jiang Yuru (蒋玉茹) | Li Yu (李宇) | Na Tingting (那婷婷) | Zhang Yangsen (张仰森)

“在机器阅读理解领域,处理和分析多方对话一直是一项具有挑战性的研究任务。鉴于中文语境下相关数据资源的缺乏,本研究构建了DialogueMRC数据集,旨在促进该领域的研究进展。DialogueMRC数据集作为首个面向中文多方对话的机器阅读理解数据集,包含705个多方对话实例,涵盖24451个话语单元以及8305个问答对。区别于以往的MRC数据集,DialogueMRC数据集强调深入理解动态的对话过程,对模型应对多方对话中的复杂性及篇章解析能力提出了更高的要求。为应对中文多方对话机器阅读理解的挑战,本研究提出了融合篇章结构感知能力的中文多方对话问答模型(DiscourseStructure-aware QA Model for Chinese Multi-party Dialogue,DSQA-CMD),该模型融合了问答和篇章解析任务,以提升对话上下文的理解能力。实验结果表明,相较于典型的基于微调的预训练语言模型,DSQA-CMD模型表现出明显优势,对比基于Longformer的方法,DSQA-CMD模型在MRC任务的F1和EM评价指标上分别提升了5.4%和10.0%;与当前主流的大型语言模型相比,本模型也展现了更佳的性能,表明了本文所提出方法的有效性。”

pdf bib
融合半监督学习与同义计算的传染病名称自动映射研究(A study on automatic mapping of infectious disease names by integrating semi-supervised learning and tautology computation)
Song Peiyan (宋培彦) | Yang Qingxiang (杨青香) | Hu Boshen (胡博深) | Du Boya (杜博雅)

“医学古籍蕴含着丰富的专业知识,然而由于古代疾病名称、术语与现代标准表述不一致等问题,严重影响了公共卫生知识组织和服务质量,现有研究主要采用专家手工映射、词义计算等方式解决,存在着工作效率和准确率偏低等问题,以古籍术语辞典作为语料进行挖掘、建立传统医学术语与现代医学术语的同义关系,并映射到国际规范,形成“古-今-外”三语互通的知识库是可行方法。为此,本文以知识组织和知识发现理论为基础,设计了古今疾病名称跨语言自动映射方法,并以传染性疾病名称为例进行验证。具体过程是:首先,利用snowball算法抽取古今疾病名称同义模式,获取了12个与传染性疾病相关的疾病名称关系模式和134个同义词对。其次,依据桑基图从关联性、成熟度和延展性3个角度分析疾病名称历时演变进行可视化关联分析。同时,结合sapbert向量和余弦相似度将传统医学疾病名称向ICD-11国际标准映射,经过人工验证映射结果达到0.23的hit@1、0.42的hit@5以及0.61的hit@10。本文发现,通过专业辞典语料,可以抽取疾病名称的语言变异情况,提高同义术语的发现效率,为构建专业知识库提供更多的入口词和语义关联性,缓解信息孤岛问题。研究还表明,以辞典中的现代医学术语名称作为映射起点,关联到ICD-11国际规范,为开展跨语言领域知识工程建设提供参考,对实现专业知识“古为今用”和国际传播也具有重要现实意义。”

pdf bib
中文语法纠错的多轮解码方法研究(Multi-Turn Decoding for Chinese Grammatical Error Correction)
Wang Xiaoying (王晓盈) | Mu Lingling (穆玲玲) | Xu Hongfei (许鸿飞)

“在语法纠错(Grammatical Error Correction,GEC)任务上,序列到序列(Sequence-to sequence,seq2seq)模型与序列到编辑(Sequence-to-edit,seq2edit)模型相比可以取得相当或更好的性能。序列到编辑模型通常通过多次迭代解码,而序列到序列模型则以从左到右的方式一次性解码,不考虑后续的词语。通过在序列到序列模型中应用多轮解码(Multi-Turn Decoding,MTD)来迭代改进前一轮的修正结果,可能会进一步提升性能。然而,多轮解码会增加推理的计算成本,且前一轮修正中的删除或替换操作可能会导致原始输入中有用的源语句信息丢失。本文提出了一种早停机制来提高效率。同时,为解决源语句信息丢失问题,本文将原始输入与上一轮的修正结果合并为一个序列。在NLPCC2018测试集、FCGEC验证集和NaCGEC测试集的实验结果表明,本文方法可在BART基线上能带来一致且显著的性能提升,F0.5值分别提高了+2.06,+2.31和+3.45,分别取得了47.34,54.58和62.09的F0.5值。”

pdf bib
中西谚语多元价值观资源库建设及对比研究(The construction and comparative study of the resource library of Chinese and Western proverbs and multiple values)
Du Xia (杜霞) | Liu Pengyuan (刘鹏远) | Yu Dong (于东)

“中西方谚语是中西方文化的结晶,分别蕴含着中西方文化中最基本的价值观。但目前缺乏中西方谚语价值观资源,难以对谚语所体现的中西方价值观进行全面的研究,特别是定量对比研究。因此本文设计了多元价值观体系,包含动机及需求、共同及特色价值观、价值判断和使用场景,根据这个体系构建了中西方谚语多元价值观资源库并进行了考察与对比分析。本文发现中西谚语在价值判断、使用场景及部分价值观上具有相似性,在具体内涵表达上各具独特性。”

pdf bib
从句子图到篇章图——基于抽象语义表示的篇章级共指标注体系设计(Discourse-Level Anaphora Annotation System Based on Abstract Semantic Representation)
Zhang Yixuan (张艺璇) | Li Bin (李斌) | Xu Zhixing (许智星) | Lu Pengxiu (卢芃秀)

“篇章共指体现篇章概念的动态转移,成为近年研究热点。本文在梳理共指理论研究的基础上,综述了相关语料库及解析方法,发现共指语料库仍存在以下两个问题:共指关系标注粗疏与基本不考虑整句语义表示的融合。本文以句子级语义标注体系(中文抽象语义表示)为基础构建篇章共指体系,构建了 100 篇共指语料库。本体系涵盖 52 种句内语义关系和 8 种篇章共指关系,二者相结合构建的篇章共指语义图,为篇章级语义分析提供新的框架和数据资源。”

pdf bib
汉语中介语词同现网络研究(A Study on Chinese Interlanguage Co-occurrence Networks QIAN Long1 ZHAO Huizhou2 DING Qian3 WANG Zhimin4)
Qian Long (钱隆) | Zhao Huizhou (赵慧周) | Ding Qian (丁芊) | Wang Zhimin (王治敏)

“近年来,运用复杂网络方法进行语言学研究已成为数字人文研究的一条新路径。本文基于214篇日本汉语学习者的书面作文,构建了6个不同能力水平的汉语中介语词同现网络,并探讨了这些网络的结构特性及其动态演变过程。研究结果显示,所有的汉语中介语词同现网络均呈现出小世界属性、无标度属性、异配性和层级结构等复杂网络的特性。这些特性揭示了汉语学习者在词汇使用方面的特定模式:低水平学习者更倾向于将低频词汇与高频词汇进行连接,这可能与学习者减轻认知负荷的习得模式有关;学习者语言水平的提升,中介语网络参数会逐渐向母语者靠拢,但是无法达到母语者的水平;此外,本研究还观察到,语言错误会对中介语网络结构产生影响,引起网络结构的变异。”

pdf bib
基于意合图语义理论的结构标注体系与资源建设∗(System and Resource Construction Based on the Semantic Theory of Chinese-Parataxis-Graph)
Guo Mengxi (郭梦溪) | Li Meng (李梦) | Xun Endong (恩东 荀) | Rao Gaoqi (高琦 饶) | Yu Zhongyang (于钟洋)

“意合图是一种以事件为中心的多层次语义表示方法,由事件结构与实体结构构成,通过多层次语义体系设计,实现对事件的多层次分析。本文细化并制定了意合图标注规范,采用分层分级的标注策略,在自主研发的在线标注系统中对新闻语料和国际中文教育阅读语料进行了意合图QNP标注工作。通过本次标注,验证了意合图体系的合理性和可标注性,并构建了意合图语义资源库。”

pdf bib
意合图:中文多层次语义表示方法∗(Parataxis Graph: Multi-level Semantic Representation Method for Chinese)
Guo Mengxi (郭梦溪) | Xun Endong (恩东 荀) | Li Meng (李梦) | Rao Gaoqi (高琦 饶)

“基于参数的语义表示虽取得成就,但符号化的语义表示仍具有不可忽视的意义。我们在语义学基础上,充分考虑符号化语义表示在NLP领域落地中的需求,提出了一种兼具通用性与扩展性的多层次语义表示方法——意合图。意合图以事件为核心,由事件结构与实体结构构成,通过多层次语义体系设计,提升与场景结合的能力,并力求对不同层级的语言单元作一贯式表示。在资源建设和相关分析实验中取得良好效果。本文将重点介绍意合图设计理念与多层次语义体系。”

pdf bib
L1到L2的跨语言激活路径研究——基于词汇识别的ERP数据∗(The Impact of Second Language Experience on Native Language Processing Across Different Language Modes)
Yang Siqin (杨思琴) | Hu Mei (胡美) | Jiang Minghu (江铭虎)

“本研究运用事件相关电位技术(event-related potentials, ERPs)在不同语言模式下探索二语学习者的二语经验是否会影响母语加工。实验招募了两组中国日语学习者作为被试,分别参与了接近双语模式的短版本实验和接近单语模式的长版本实验。统计结果显示,在短版本实验中,当汉-日同形异义词作为启动词时,语义相关性因素引发的N400波幅差异并不显著,但是引发的LPC波幅差异显著。在长版本实验中,语义相关性因素引发的N400和LPC的波幅都显著。据此,本研究推论,当被试在母语环境下加工母语语义时,二语语义在接近双语模式的短版本实验中被激活并影响母语加工,这种影响仅存在于N400的时间窗口期。但是,在接近单语模式的长版本实验中,二语语义在两个时间窗口里都没有影响母语加工。本研究从语言模式和时间窗口两个维度拓展了对二语经验影响母语加工的认识,对构建高质量的人类语言计算模型和系统具有重要的理论意义和应用价值。”

pdf bib
大语言模型故事理解能力评价数据集(Benchmarking story comprehension ability of large language model)
Yan Guohang (闫国航) | Guo Yaxin (郭亚鑫) | Tan Hongye (谭红叶) | Zhang Hu (张虎)

“故事包含大量的社会、物理等常识,同时蕴含深刻的道理,是知识传播、文化传承、价值塑造的重要载体。故事理解是NLP中的一项重要任务。近几年,研究者对大语言模型(LLMs)的语言理解能力进行了很多评估与分析,但由于现有的故事理解数据集大多为答案出现在原文的实体类问题,因此对LLMs故事理解能力的评价与分析非常有限。为此,本文构建了一个寓言故事理解数据集CRMUS,并基于人类故事理解的认知过程:先进行常识推理,然后理解故事寓意,设计了两个任务来评价模型的相应能力。基于CSMUS数据集,我们对多个代表性的LLMs进行了评估,发现:LLMs已经可以较好地理解故事中的常识并进行推理,但在理解故事寓意方面还存在很大提升空间。此外,我们使用项目反应理论(IRT)对数据集进行了质量分析,表明该数据集是高质量的,可以有效地评估LLMs。”

pdf bib
大语言模型开放性生成文本中的职业性别偏见研究(Generated by Large Language Models)
Zhang Xu (张旭) | Guo Mengqing (郭梦清) | Zhu Shucheng (朱述承) | Yu Dong (于东) | Liu Ying (刘颖) | Liu Pengyuan (刘鹏远)

“大语言模型问世以来,在自然语言处理诸多任务上都取得了惊人的表现。但其中可能存在的安全性和公平性问题也引起了人们的重视,特别是模型生成文本可能含有对特定职业、性别等群体的偏见和歧视。本文通过两种性别表征形式,构造了显性和隐性的”性别+职业“提示语,提示大语言模型生成开放性文本,并从情感极性、词汇丰富度和冒犯性程度三个维度对生成文本的偏见进行分析,评估并比较了传统模型与以ChatGPT为代表的大语言模型中的职业显性性别和隐性性别交叉偏见。结果表明,比起单维度的职业、性别身份信息,更复杂的职业性别交叉身份信息会减少ChatGPT生成文本中的偏见,具体表现为情感极性趋于中性,词汇丰富度提高;ChatGPT对于不同类型的职业性别身份展现出差异的态度,对研究型、艺术型等创造类的职业情感极性更高,对事务型、经管型等与人打交道的职业情感极性偏低;另外,ChatGPT相比之前的GPT-2模型在生成能力和消除偏见上有所进步,在多种组合身份提示下的生成文本更加积极、多样,冒犯性内容显著减少。”

pdf bib
大语言模型在中文文本纠错任务的评测(Evaluation of large language models for Chinese text error correction tasks)
Mu Lingling (穆玲玲) | Wang Xiaoying (王晓盈) | Cui Jiajia (崔佳佳)

“大语言模型(Large Language Models,LLMs)在信息抽取、机器翻译等自然语言处理任务上的能力已被广泛评估,但是在文本纠错方面还主要局限于评价GPT的英文语法纠错能力 。中文文本纠错任务包括中文语法检测 (Chinese Grammatical Error Detection,CGED)和中文语法纠错(Chinese Error Correction,CGEC)两个子任务。本文使用提示的方法评估了国内外的主流大模型在中文语法检测和中文语法纠错任务上的能力。论文设计了不同的提示策略,对结果进行了整体和细粒度的分析。在NLPCC2018和CGED2018测试集上的实验结果表明,ERNIE-4和ChatGLM-4的中文文本纠错能力优于GPT-3.5-Turbo和LLaMa-2-7B-Chat,少样本思维链提示策略性能最优,对词序错误和拼写错误上纠正的准确率较高,说明大模型在低资源下具有较好的中文文本纠错能力。然而测试结果显示大模型的召回率比基线模型高至少14个百分点,说明大模型在中文文本纠错任务上存在过度校正的问题。”

pdf bib
面向“以AB”构式语义场景的汉语框架识别数据集构建⋆(Dataset for Recognizing Chinese Semantic Frames based on the Semantic Scenario of the “Yi A Wei B” Construction)
Yang Peiyuan (杨沛渊) | Su Xuefeng (苏雪峰) | Li Juncai (李俊材) | Yan Zhichao (闫智超) | Chai Qinghua (柴清华) | Li Ru (李茹)

“汉语中普遍存在一些语义场景,其语义核心不是以单个词语呈现,而是通过句子中的某个特定结构来表达。然而当前公开发表的数据集中,只有极少数的数据集将这种特定结构作为语义单元进行研究。汉语框架语义知识库是进行汉语深层语义分析与推理的优质资源,目前其激活框架的基本单位均为句中的一个词。本文以汉语框架语义知识库为基础,引入构式语法,使用2020《人民日报》语料库,以“以A为B”构式为例,建立了基于“以A为B”构式的汉语框架识别数据集,包含23849条例句,相应框架141个。本文使用多个汉语框架识别模型及大语言模型在该数据集上进行了实验,并针对传统框架识别模型在以构式为目标词的框架识别任务中由于目标词信息匮乏导致的识别困难问题,提出了基于目标词转化和数据增强的两种方法,使模型准确率达到了88.19%,有效提升了模型挖掘构式蕴含的深层语义信息的能力。”

pdf bib
上古汉语分词和词性标注语料库的构建(Construction of Ancient Chinese Word Segmentation and Part-Of-Speech Corpus)
Ke Yonghong (柯永红)

“针对国内尚无开放的大规模上古汉语分词及词性标注语料库可用的问题,提出以人工为主+机器辅助的标注模式,构建一个包括46部文献的上古汉语分词及词性标记语料库。描述了语料选择、文本分词、词性标注和质量控制等建库过程,分析了该语料库词长、词频、词用等分布,评估了标注质量。已经完成标注的语料库包括323余万字、217万余词。与EvaHan2022基测集和盲测集的分词、词性标注一致度分别为93.70%、89.49%和92.83%、89.86%。该语料库可用于古汉语研究、辞书编撰、语言教学、人工智能等多个领域。”

pdf bib
图解句式结构体系及其树库构建(Diagrammatic Sentence Pattern Structure System and Its Treebank Construction)
Peng Weiming (彭炜明) | Zhao Min (赵敏) | Song Yuchen (宋昱辰) | Hu Jiajia (胡佳佳) | Song Tianbao (宋天宝) | Sui Zhifang (穗志方) | Song Jihua (宋继华)

“句式结构是一种基于句本位语法的形式化句法结构,采用自定义的图解形式呈现句子结构。本文提出了涵盖小句结构、词法结构和句间结构三方面的句式结构体系,阐明了其设计理念以及句本位的析句原则,最后概述了基于该体系构建汉语树库的工程进展情况。”

pdf bib
英语科技论文摘要语步结构语料库构建研究(Research on Construction of Corpus for Move Structures in Abstracts of English Scientific Research Articles)
Li Hongzheng (李洪政) | Wang Ruojin (王若锦) | Feng Chong (冯冲) | Liu Fang (刘芳)

“语步结构是学术论文中的文本语篇单位,在语步分析、论文写作等方面具有重要价值。尽管关于学术论文的语步研究非常丰富,但语步标注数据资源仍然相对较少。本研究开发构建了一个英语科技论文摘要语步结构标注语料库,目前已标注近3.4万个语步结构,涵盖了自然语言处理、计算机视觉、通信工程、机械工程等学科领域,同时进行了标注数据统计和分析。语料库构建的第一阶段依靠人工标注形成高质量语料,在第二阶段也是主要阶段,采用了基于BERT的自动识别与标注模型,在保证标注质量的同时能够提升标注速度,扩大标注规模。本研究基于构建的语料库开展了不同学科领域摘要语步结构识别实验,对比了我们的模型与ChatGPT和Claude3等大语言模型的识别效果。结果显示我们的模型在各类语步识别上的F1指标均优于大语言模型,表明了模型的有效性。该语料库目前可公开获取使用,能够为科技论文信息抽取、英语写作智能批改等自然语言处理相关任务和学术用途英语等外语教学与研究等提供必要的数据资源,同时也能有效推动外语教育数字化转型。”

pdf bib
Self-Guide:一种基于自我规划的大语言模型推理增强方法(Self-Guide: Enhancing LLM Reasoning Ability via Self-Plan)
Liu Yibin (刘艺彬) | Liu Zhenghao (刘正皓) | Yan Yukun (闫宇坤) | Yu Shi (于是) | Wang Shuo (王硕) | Yang Liner (麟儿 杨) | Chen Huimin (陈慧敏) | Gu Yu (谷峪) | Yu Ge (于戈)

“尽管大语言模型在自然语言处理任务中取得显著进展,但其在复杂问题推理等领域还面临着认知负荷问题,即大语言模型在推理过程需要记忆并处理大量信息。因此,如何有效地减少语言模型推理过程中的认知负荷,缓解推理过程中可能出现的认知过载是一个亟待解决的问题。对此本文提出了Self-Guide方法,用于增强语言模型的推理能力。该方法通过指引大语言模型生成常识知识和推理指导,让语言模型基于自我规划来增强其推理能力,并通过与推理链结合的方式对模型的推理过程进行校准。与现有方法不同的是,本文在不对大语言模型进行微调或使用外部工具的情况下,显著提升了语言模型的推理性能。实验结果表明,Self-Guide方法在四种常见推理任务上性能显著优于基线方法,同时相比传统的推理链模型,Self-Guide方法在推理能力较弱的模型上也具有良好的泛化性能。通过结合大语言模型的自我规划和推理能力,Self-Guide方法为提升语言模型的推理能力提供了一种新的有效途径。”

pdf bib
基于大模型的交互式谎言识别:数据和模型(Unveiling Lies: Enhancing Large Language Models for Real-World Lie Detection in Interactive Dialogues)
Ji Chengwei (纪程炜) | Wang Siyuan (王思远) | Li Taishan (李太山) | Mou Xinyi (牟馨忆) | Zhao Limin (赵丽敏) | Xue Lanqing (薛兰青) | Ying Zhenzhe (应缜哲) | Wang Weiqiang (王维强) | Huang Xuanjing (黄萱菁) | Wei Zhongyu (魏忠钰)

“面向对话交互过程的谎言识别技术在不同的应用场景有广泛的应用需求。现有的鉴谎技术往往在整体的对话级别上给出最终决策,而缺乏对细粒度谎言特征和线索的逻辑分析,难以满足场景中对于可解释性的需求。本文提出了谎言指征和语义不一致线索的概念,用于帮助识别对话中的谎言,提升鉴谎方法的可解释性。文章同时提出一个谎言识别框架,用于训练谎言识别大语言模型(LD-LLM)。它利用细粒度的谎言指征并且发现对话中是否存在语义不一致线索,以实现更可靠的谎言识别。文章在真实交互场景中构建了两个谎言识别数据集FinLIE和IDLIE,分别关注金融风控场景和身份识别场景。实验结果表明,基于这两个数据集创建的指令数据集微调得到的LD-LLM,在基于真实交互的谎言识别上达到了最先进的水平。”

pdf bib
基于动态聚类与标签空间映射的上下文学习模板构建方法(In-Context Learning Demonstration Construction Method based on Dynamic Clustering and Label Space Mapping)
Zhang Qi (张琦) | Jin Xingnan (金醒男) | Pei Yu (裴誉) | Du Yongping (杜永萍)

“面向大语言模型提供自然语言指令,可生成预期输出,体现了其上下文学习能力。上下文学习的性能与上下文模板质量密切相关,现有的工作通常使用单一的选择算法进行模板构建,无法充分激发上下文学习能力。本文提出基于动态聚类与标签空间映射的上下文学习模板构建方法,动态选择相关示例,进一步提出聚类筛选方法,实现不同语义簇中示例多样化的选择。设计基于损失函数的排序选择方法,评估模板学习正确标签空间映射分布的能力,排序形成最终模板。在自然语言推理等任务中的实验结果表明,本文提出的方法使两个不同的大语言模型准确率最高分别提升3.2%和8.9%。”

pdf bib
基于领域信息分解式学习的大语言模型修辞认知增强方法(Method for Enhancing Rhetorical Cognition of Large Language Models Based on Decomposed Learning of Field Information)
Wang Wen (王雯) | Yu Dong (于东) | Liu Pengyuan (刘鹏远)

“中文修辞手法多样且概念差异性大,大语言模型对部分修辞手法的认知存在缺陷。针对该问题,本文研究如何增强大语言模型的修辞认知能力,并探究其与修辞识别性能之间的关系。为此,本文提出了QAKAG框架,此框架首先引入信息分解式学习思想,通过问答形式检测大语言模型的修辞认知缺陷,然后以四种不同的知识组合方式探究最优信息补充机制,实现了大语言模型修辞认知能力的增强。本文构建了多类别中文修辞句数据集MCRSD和修辞知识库MCRKB,并在ChatGPT4等六个大语言模型上开展实验研究,验证了QAKAG框架对增强大语言模型修辞认知能力的有效性以及其各阶段的必要性。结果表明,在QAKAG框架的增强下,六个大语言模型在多类别修辞识别任务上的性能相较直接回答识别问题的平均F1值提高22.1%,优于Zero-shot-CoT、RAG-BaiKe、Few-Shot5提示策略。”

pdf bib
基于中间层对齐的异构师生模型知识蒸馏(Knowledge distillation of heterogeneous teacher-student model with intermediate layer loss)
Zhai Feiyan (翟飞燕) | Wang Renzhi (王任之) | Li Piji (李丕绩)

“知识蒸馏技术作为大语言模型时代的一项前沿模型压缩策略,通过将复杂模型的知识有效迁移至简单模型,显著降低了模型的参数规模和计算成本。尽管如此,目前主流的生成式大语言模型蒸馏算法主要集中于优化师生模型间的最后输出层损失,而忽视了对模型中间层的探索。此外,针对中间层蒸馏的研究往往对师生模型的结构一致性有着严格的要求,无法处理异构模型间的蒸馏问题,从而存在明显的局限性。针对这些问题,我们提出了一种新的知识蒸馏算法:引入了中间层蒸馏损失的异构生成式师生大语言模型知识蒸馏算法。该算法首先提取师生模型的中间层信息作为蒸馏对象,随后通过专门设计的中间层映射规则和对齐模块,实现异构模型间基于中间层的知识对齐与损失计算。最后,联合优化各个蒸馏损失的比例。通过在五个相关数据集上进行实验验证,我们的方法在提高蒸馏效果方面展现出显著优势。”

pdf bib
面向小规模大语言模型推理优化的推理路径排序方法(A Reasoning Paths Ranking Method for Reasoning Optimization of Small-scale Large Language Models)
Li Jun (李俊) | Bai Yu (白宇) | Liu Yuting (刘雨婷)

“尽管大语言模型(LLM)在自然语言处理领域取得巨大成功,但是伴随其千亿级参数 规 模 的 训 练 也 产 生 了 巨 大 的 计 算 成 本 。 小 规 模 大 语 言 模 型(SLLM)作 为 低 资 源场景下实现LLM部署的可替代方案,任务处理能力与LLM尚存在明显差距。尽管上下文学习(ICL)等提示方法在一定程度上提升了SLLM的问题处理能力,但基于人工构建的提示往往需要参与者具备特定的专业领域知识,这给LLM的普适推广带来了挑战。针对以上问题,本文提出了一个基于SLLM的问题推理框架,通过在推理路径生成和答案生成两个阶段之间引入基于逐步语义验证器(SSVRP)的推理路径排序选择机制,在无人干预情况下实现SLLM推理能力提升。实验结果表明,SSVRP有效地增强了SLLM的推理性能,在4个推理任务中的平均准确率分别达到了54.3%,90.6%,64.3%和63.7%,并在其中3个推理任务中都取得了最新的SOTA结果。”

pdf bib
面向中文实体识别的Transformers模型句子级非对抗鲁棒性研究(On Sentence-level Non-adversarial Robustness of Chinese Named Entity Recognition with Transformers Model)
Wang Libang (王立帮) | Wang Peiyan (王裴岩) | Shen Sijia (沈思嘉)

“基于Transformers的中文实体识别模型在标准实体识别基准测试中取得了卓越性能,其鲁棒性研究也受到了广泛关注。当前,中文实体识别模型在实际部署中所面临的句子级非对抗鲁棒性问题研究不足,该文针对该问题开展了研究。首先,该文从理论上分析并发现了Transformer中自注意力、相对位置嵌入及绝对位置嵌入对模型鲁棒性的负面影响。之后,提出了实体标签增强和滑动窗口约束的鲁棒性增强方法,并从理论上证明了提出方法能够提升Transformers模型的实体识别鲁棒性。最后,通过在3个中文数据集的实验,研究了4种基于Transformer的实体识别模型的脆弱性,所提出方法使模型的鲁棒性F1值提升最高可达4.95%。”

pdf bib
银瞳:基于自适应语义空间学习的中文金融多任务大模型(SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning)
Zhou Yuhang (周宇航) | Li Zeping (李泽平) | Tian Siyu (思雨 田) | Ni Yuchen (倪雨琛) | Zhang Jian (张健) | Liu Xiang (刘响) | Ye Guangnan (叶广楠) | Wu Jie (吴杰) | Chai Hongfeng (柴洪峰)

“大语言模型正逐渐被用于各种垂直领域,利用其广泛的知识储备来赋能领域中的多种场景。然而,各领域拥有多种待学习的特定任务,且多源异构的领域数据容易引发模型进行任务迁移时的冲突。基于此,本研究提出自适应语义空间学习框架,利用对语义空间内数据的自适应重分布,提升多专家模型的性能及选择效果,并基于此框架训练了一个金融多任务大模型“银瞳”。研究结果表明,我们的框架只需利用10%的数据就能达到接近全数据训练的效果,并拥有较强的泛化表现。”

pdf bib
Enhancing Free-Form Table Question Answering Models by Distilling Relevant-Cell-Based Rationales
Yang Zhiyu | Wang Shuo | Yan Yukun | Liu Pengyuan | Yu Dong

“Free-form table question answering is a challenging task since tables contain structured contentscompared to plain texts, which requires high-level reasoning abilities to effectively identify cellsthat are relevant to the question and produce a correct and faithful answer based on their relations.Large language models (LLMs) have exhibited remarkable reasoning capabilities in numerousNLP applications. However, in some specific tasks, specially-trained small models can still out-perform LLMs. Furthermore, small models require extremely less computation costs comparedto LLMs. To leverage the strengths of both types of models, we propose a Relevant-Cell-basedKnowledge Distillation with inference-time Teacher Guidance (RCKD-TG) method. This ap-proach aims to combine small free-form table question answering models’ abilities to learn fromhuman annotations and large language models’ abilities to effectively reason from table contents,via applying Relevant-Cell-based rationales distilled from LLMs to small models’ training andinference stages. Our experiments demonstrate the superiority of our method over vanilla smallmodels in correctness, faithfulness, adequacy and fluency, also over general LLMs in adheringto the style of human annotations. We achieve state-of-the-art performance on FeTaQA, a rep-resentative free-form table question answering benchmark. Our result of a 41.3 BLEU scoredemonstrates the feasibility of effectively using small models’ task-specific abilities and LLMs’reasoning capabilities at the same time. Additionally, our method exhibits high computation ef-ficiency and data efficiency. Compared to strong baselines, we achieve better performance withsignificantly less training data.”

pdf bib
Enhancing Sequence Representation for Personalized Search
Wang Shijun | Zhang Han | Yuan Zhe

“The critical process of personalized search is to reorder candidate documents of the current querybased on the user’s historical behavior sequence. There are many types of information containedin user historical information sequence, such as queries, documents, and clicks. Most existingpersonalized search approaches concatenate these types of information to get an overall userrepresentation, but they ignore the associations among them. We believe the associations ofdifferent information mentioned above are significant to personalized search. Based on a hierar-chical transformer as base architecture, we design three auxiliary tasks to capture the associationsof different information in user behavior sequence. Under the guidance of mutual information,we adjust the training loss, enabling our PSMIM model to better enhance the information rep-resentation in personalized search. Experimental results demonstrate that our proposed methodoutperforms some personalized search methods.”

pdf bib
Joint Similarity Guidance Hash Coding Based on Adaptive Weight Mixing Strategy For Cross-Modal Retrieval
Sun Yaqi | Yun Jing | Zhuoqun Ma

“There is a continuous and explosive growth of multimodal data. Efficient cross-modal hash-ing retrieval is of significant importance in conserving computational resources.To further en-hance the attention to informative data within modalities and capture the semantic correlationsin cross-modal data, we propose an enhanced deep Joint-Semantics Reconstructing Hashing al-gorithm, which is the Joint Similarity Guidance Hash Coding Based on Adaptive Weight MixingStrategy(JSGHCA). The algorithm focuses on delving deeper into the correlations of the data incross-modal. We introduce the adaptive weight mixing strategy to construct the semantic affinitymatrix, so that the matrix can identify each modal data with specific weight in each batch. Atthe same time, in the process of the hash code generation, we introduce collaborative attentionmechanism. It helps the model to pay more attention to the local information of each modality,thereby capturing the semantic features within each modality more accurately. Additionally, itenables the model to jointly process the attention across different modalities and extract sharedsemantic features more precisely. Experimental results show that the proposed model is signifi-cantly better than the deep joint semantic reconstruction hash algorithm on multiple benchmarkdatasets.”

pdf bib
Generate-then-Revise: An Effective Synthetic Training Data Generation Framework For Event Detection Retrieval
Du Huidong | Sun Hao | Liu Pengyuan | Yu Dong

“Large language models (LLMs) struggle with event detection (ED) due to the structured and vari-able number of events in the output. Existing supervised approaches rely on a large amount ofmanually annotated corpora, facing challenges in practice when event types are diverse and theannotated data is scarce. We propose Generate-then-Revise (GtR), a framework that leveragesLLMs in the opposite direction to address these challenges in ED. GtR utilizes an LLM to gen-erate high-quality training data in three stages, including a novel data revision step to minimizenoise in the synthetic data. The generated data is then used to train a smaller model for evalua-tion. Our approach demonstrates significant improvements on the low-resource ED. We furtheranalyze the generated data, highlighting the potential of synthetic data generation for enhancingED performance.Introduction”

pdf bib
E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness
Chen Linqing | Wang Weilei | Hu Dongyang

“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”

pdf bib
Multi-features Enhanced Multi-task Learning for Vietnamese Treebank Conversion
Zhang Zhenguo | Liu Jianjian | Ying Li

“Pre-trained language representation-based dependency parsing models have achieved obviousimprovements in rich-resource languages. However, these model performances depend on thequality and scale of training data significantly. Compared with Chinese and English, the scale ofVietnamese Dependency treebank is scarcity. Considering human annotation is labor-intensiveand time-consuming, we propose a multi-features enhanced multi-task learning framework toconvert all heterogeneous Vietnamese Treebanks to a unified one. On the one hand, we exploitTree BiLSTM and pattern embedding to extract global and local dependency tree features fromthe source Treebank. On the other hand, we propose to integrate these features into a multi-tasklearning framework to use the source dependency parsing to assist the conversion processing.Experiments on the benchmark datasets show that our proposed model can effectively convertheterogeneous treebanks, thus further improving the Vietnamese dependency parsing accuracy byabout 7.12 points in LAS.”

pdf bib
SimCLNMT: A Simple Contrastive Learning Method for Enhancing Neural Machine Translation Quality
Xu Menglong | Zhang Yanliang

“Neural Machine Translation (NMT) models are typically trained using Maximum LikelihoodEstimation (MLE). However, this approach has a limitation: while it might select the bestword for the immediate context, it does not generally optimize for the entire sentence. Tomitigate this issue, we propose a simple yet effective training method called SimCLNMT.This method is designed to select words that fit well in the immediate context and also en-hance the overall translation quality over time. During training, SimCLNMT scores multiplesystem-generated (candidate) translations using the logarithm of conditional probabilities.Itthen employs a ranking loss function to learn and adjust these probabilities to align with thecorresponding quality scores. Our experimental results demonstrate that SimCLNMT consis-tently outperforms traditional MLE training on both the NIST English-Chinese and WMT’14English-German datasets. Further analysis also indicates that the translations generated by ourmodel are more closely aligned with the corresponding quality scores. We release our code athttps://github.com/chaos130/fairseq_SimCLNMT.Introduction”

pdf bib
Translate-and-Revise: Boosting Large Language Models for Constrained Translation
Huang Pengcheng | Mu Yongyu | Wu Yuzhang | Li Bei | Xiao Chunyang | Xiao Tong | Jingbo Zhu

“Imposing constraints on machine translation systems presents a challenging issue because thesesystems are not trained to make use of constraints in generating adequate, fluent translations. Inthis paper, we leverage the capabilities of large language models (LLMs) for constrained trans-lation, given that LLMs can easily adapt to this task by taking translation instructions and con-straints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and,in some cases, ignore the given constraints. This is in part because LLMs might be overly confi-dent in their predictions, overriding the influence of the constraints. To overcome this overidingbehaviour, we propose to add a revision process that encourages LLMs to correct the outputs byprompting them about the constraints that have not yet been met. We evaluate our approach onfour constrained translation tasks, encompassing both lexical and structural constraints in mul-tiple constraint domains. Experiments show 15% improvement in constraint-based translationaccuracy over standard LLMs and the approach also significantly outperforms neural machinetranslation (NMT) state-of-the-art methods.IntroductionConstrained translation seeks to generate translations that adhere to pre-specified constraints. Toachieve this, conventional approaches impose constraints on machine translation systems and force themto follow the constraints during inference (Hokamp and Liu, 2017; Hasler et al., 2018; Dinu et al., 2019;Bergmanis and Pinnis, 2021b; Wang et al., 2022b; Ailem et al., 2022). More recently, large languagemodels (LLMs) have been shown to be strong translation systems (Hendy et al., 2023; Moslem et al.,2023). They provide a general way to involve various instructions, demonstrations, and constraints intothe translation process (Mu et al., 2023; Bogoychev and Chen, 2023), enabling us to perform constrainedtranslation using off-the-shelf, well-trained LLMs.”

pdf bib
A Multi-Task Biomedical Named Entity Recognition Method Based on Data Augmentation
Zhao Hui | Zhao Di | Meng Jiana | Liu Shuang | Lin Hongfei

“The rapid development of artificial intelligence has led to an explosion of literature in the biomed-ical field, and Biomedical Named Entity Recognition (BioNER) can quickly and accurately iden-tify key information from unstructured text. This task has become an important topic to promotethe rapid development of intelligence in the biomedical field. However, in the Named EntityRecognition (NER) of the biomedical field, there are always some problems of unclear boundaryrecognition, the underutilization of hierarchical information in sentences and the scarcity of train-ing data resources. Based on this, this paper proposes a multi-task BioNER model based on dataaugmentation, using four data augmentation methods: Mention Replacement (MR), Label-wisetoken Replacement (LwTR), Shuffle Within Segments (SiS) and Synonym Replacement (SR)to increase the training data. The syntactic information is extracted by incorporating the inputsentence into the Graph Convolutional Network (GCN), and then the tag information encodedby BERT is interacted through a co-attention mechanism to obtain an interaction matrix. Subse-quently, NER is performed through boundary detection tasks and span classification tasks. Com-parative experiments with other methods are conducted on the BC5CDR and JNLPBA datasets,as well as the CCKS2017 dataset. The experimental results demonstrate the effectiveness of themodel proposed in this paper.”

pdf bib
Biomedical Event Causal Relation Extraction by Reasoning Optimal Entity Relation Path
Li Lishuang | Mi Liteng | Zhang Beibei | Xiang Yi | Feng Yubo | Qin Xueyang | Tang Jingyao

“Biomedical Event Causal Relation Extraction (BECRE) is an important task in biomedical infor-mation extraction. Existing methods usually use pre-trained language models to learn semanticrepresentations and then predict the event causal relation. However, these methods struggle tocapture sufficient cues in biomedical texts for predicting causal relations. In this paper, we pro-pose a Path Reasoning-based Relation-aware Network (PRRN) to explore deeper cues for causalrelations using reinforcement learning. Specifically, our model reasons the relation paths betweenentity arguments of two events, namely entity relation path, which connects the two biomedicalevents through the multi-hop interactions between entities to provide richer cues for predictingevent causal relations. In PRRN, we design a path reasoning module based on reinforcementlearning and propose a novel reward function to encourage the model to focus on the length andcontextual relevance of entity relation paths. The experimental results on two datasets suggestthat PRRN brings considerable improvements over the state-of-the-art models.Introduction”

pdf bib
Joint Entity and Relation Extraction Based on Bidirectional Update and Long-Term Memory Gate Mechanism
Qian Yili | Ren Enlong | Xu Haonan

“Joint entity recognition and relation extraction are important tasks in natural language process-ing. While some previous work has recognized the importance of relation information in jointextraction, excessively focusing on relation information without utilizing entity information maylead to information loss and affect the identification of relation tuples. Additionally, ignoring theutilization of original information may result in the loss of hierarchical and semantic information,further reducing the richness of information.To address these issues, we propose a bidirectionalinformation updating mechanism that integrates entity and relation information, iteratively fus-ing fine-grained information about entities and relations. We introduce a long-term memory gatemechanism to update and utilize original information using feature information, thereby enhanc-ing the model’s ability for entity recognition and relation extraction. We evaluated our approachon two Chinese datasets and achieved state-of-the-art results.”

pdf bib
MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition
Li Jiatong | Meng Kui

“In Chinese Named Entity Recognition, character substitution is a complicated linguistic phe-nomenon. Some Chinese characters are quite similar as they share the same components or havesimilar pronunciations. People replace characters in a named entity with similar characters togenerate a new collocation but refer to the same object. As a result, it always leads to unrecog-nizable or mislabeling errors in the NER task. In this paper, we propose a lightweight method,MFE-NER, which fuses glyph and phonetic features to help pre-trained language models handlethe character substitution problem in the NER task with limited extra cost. Basically, in the glyphdomain, we disassemble Chinese characters into Five-Stroke components to represent structurefeatures. In the phonetic domain, an improved phonetic system is proposed in our work, makingit reasonable to describe phonetic similarity among Chinese characters. Experiments demon-strate that our method performs especially well in detecting character substitutions while slightlyimproving the overall performance of Chinese NER.”

pdf bib
UDAA: An Unsupervised Domain Adaptation Adversarial Learning Framework for Zero-Resource Cross-Domain Named Entity Recognition
Li Baofeng | Tang Jianguo | Qin Yu | Xu Yuelou | Lu Yan | Wang Kai | Li Lei | Zhou Yanquan

“The zero-resource cross-domain named entity recognition (NER) task aims to perform NER in aspecific domain where labeled data is unavailable. Existing methods primarily focus on transfer-ring NER knowledge from high-resource to zero-resource domains. However, the challenge liesin effectively transferring NER knowledge between domains due to the inherent differences inentity structures across domains. To tackle this challenge, we propose an Unsupervised DomainAdaptation Adversarial (UDAA) framework, which combines the masked language model auxil-iary task with the domain adaptive adversarial network to mitigate inter-domain differences andefficiently facilitate knowledge transfer. Experimental results on CBS, Twitter, and WNUT2016three datasets demonstrate the effectiveness of our framework. Notably, we achieved new state-of-the-art performance on the three datasets. Our code will be released.Introduction”

pdf bib
Triple-view Event Hierarchy Model for Biomedical Event Representation
Huang Jiayi | Li Lishuang | Qin Xueyang | Xiang Yi | Li Jiaqi | Feng Yubo

“Biomedical event representation can be applied to various language tasks. A biomedical eventoften involves multiple biomedical entities and trigger words, and the event structure is complex.However, existing research on event representation mainly focuses on the general domain. Ifmodels from the general domain are directly transferred to biomedical event representation, theresults may not be satisfactory. We argue that biomedical events can be divided into three hierar-chies, each containing unique feature information. Therefore, we propose the Triple-views EventHierarchy Model (TEHM) to enhance the quality of biomedical event representation. TEHM ex-tracts feature information from three different views and integrates them. Specifically, due to thecomplexity of biomedical events, We propose the Trigger-aware Aggregator module to handlecomplex units within biomedical events. Additionally, we annotate two similarity task datasetsin the biomedical domain using annotation standards from the general domain. Extensive exper-iments demonstrate that TEHM achieves state-of-the-art performance on biomedical similaritytasks and biomedical event casual relation extraction.Introduction”

pdf bib
DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts
Zhou Jie | Gao Shengxiang | Yu Zhengtao | Dong Ling | Wang Wenjun

“Dialect speech recognition has always been one of the challenges in Automatic Speech Recog-nition (ASR) systems. While lots of ASR systems perform well in Mandarin, their performancesignificantly drops when handling dialect speech. This is mainly due to the obvious differencesbetween dialects and Mandarin in pronunciation and the data scarcity of dialect speech. In thispaper, we propose DialectMoE, a Chinese multi-dialects speech recognition model based onMixture-of-Experts (MoE) in a low-resource conditions. Specifically, DialectMoE assigns inputsequences to a set of experts using a dynamic routing algorithm, with each expert potentiallytrained for a specific dialect. Subsequently, the outputs of these experts are combined to derivethe final output. Due to the similarities among dialects, distinct experts may offer assistance inrecognizing other dialects as well. Experimental results on the Datatang dialect public datasetshow that, compared with the baseline model, DialectMoE reduces Character Error Rate (CER)for Sichuan, Yunnan, Hubei and Henan dialects by 23.6%, 32.6%, 39.2% and 35.09% respec-tively. The proposed DialectMoE model demonstrates outstanding performance in multi-dialectsspeech recognition.”

pdf bib
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Zhang ChuYuan | Yi Jiangyan | Tao Jianhua | Wang Chenglong | Yan Xinrui

“Recent advancements in neural speech synthesis technologies have brought aboutwidespread applications but have also raised concerns about potential misuse and abuse.Addressing these challenges is crucial, particularly in the realms of forensics and intellec-tual property protection. While previous research on source attribution of synthesizedspeech has its limitations, our study aims to fill these gaps by investigating the identifi-cation of sources in synthesized speech. We focus on analyzing speech synthesis modelfingerprints in generated speech waveforms, emphasizing the roles of the acoustic modeland vocoder. Our research, based on the multi-speaker LibriTTS dataset, reveals twokey insights: (1) both vocoders and acoustic models leave distinct, model-specific fin-gerprints on generated waveforms, and (2) vocoder fingerprints, being more dominant,may obscure those from the acoustic model. These findings underscore the presence ofmodel-specific fingerprints in both components, suggesting their potential significance insource identification applications.”

pdf bib
Knowledge Graph-Enhanced Recommendation with Box Embeddings
Liang Qiuyu | Wang Weihua | Lv Lei | Bao Feilong

“Knowledge graphs are used to alleviate the problems of data sparsity and cold starts in recom-mendation systems. However, most existing approaches ignore the hierarchical structure of theknowledge graph. In this paper, we propose a box embedding method for knowledge graph-enhanced recommendation system. Specifically, the box embedding represents not only the in-teraction between the user and the item, but also the head entity, the tail entity and the relationbetween them in the knowledge graph. Then the interaction between the item and the corre-sponding entity is calculated by the multi-task attention unit. Experimental results show thatour method provides a large improvement over previous models in terms of Area Under Curve(AUC) and accuracy in publicly available recommendation datasets with three different domains.”

pdf bib
Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
Zhang Jingshen | Chen Xinglu | Qiu Xinying | Wang Zhimin | Feng Wenhe

“Chinese sentence simplification faces challenges due to the lack of large-scale labeledparallel corpora and the prevalence of idioms. To address these challenges, we pro-pose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel frameworkthat combines data augmentation techniques. RISS introduces two key components: (1)Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sen-tence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the compre-hension and simplification of idiomatic expressions. By integrating RPS and IAS usingmulti-stage and multi-task learning strategies, RISS outperforms previous state-of-the-artmethods on two Chinese sentence simplification datasets. Furthermore, RISS achievesadditional improvements when fine-tuned on a small labeled dataset. Our approachdemonstrates the potential for more effective and accessible Chinese text simplification.”

pdf bib
A Tone-based Hierarchical Structure of Chinese Prosody
Li Ya

“In Chinese speech engineering, many projects use a conventional, syllable-based prosodic hierarchyas an underlying framework to process natural or synthesized speech. However, Chinese as a tonelanguage has its own way of expressing prosody, that is, through tonal interaction, especially tonesandhi. By utilizing the capacity of tone as a dual unit of pitch and timing, the present study proposesa tone-based, three-layer-four-level structure for Chinese prosody. The three layers are tone, toneprosody, and intonation, respectively composed of one level of pitch units, two levels of toneprosody units (basic and derived), and one level of intonation units. These four levels of units areused to replace syllable, prosodic word, phonological phrase, and intonational phrase in aconventional hierarchy. Tone prosody units are established based on sizes or types of tone sandhidomains, so when applied to the same clause uttered in Mandarin and Shanghai Wu Chinese, theyare timed differently and branched toward different directions at different levels, hence capable ofcapturing rhythmic and melodic patterns of the two distinctive types of Chinese. Overall, given itstheory-friendly design, the proposed structure may be used as a unifying framework in Chinesespeech engineering.”

pdf bib
Linguistic Guidance for Sequence-to-Sequence AMR Parsing
Tang Binghao | Lin Boda | Li Si

“The Abstract Meaning Representation (AMR) parsing aims at capturing the meaning of a sen-tence in the form of an AMR graph. Sequence-to-sequence (seq2seq)-based methods, utilizingpowerful Encoder-Decoder pre-trained language models (PLMs), have shown promising perfor-mance. Subsequent works have further improved the utilization of AMR graph information forseq2seq models. However, seq2seq models generate output sequence incrementally, and inac-curate subsequence at the beginning can negatively impact final outputs, also the interconnec-tion between other linguistic representation formats and AMR remains an underexplored domainin existing research. To mitigate the issue of error propagation and to investigate the guidinginfluence of other representation formats on PLMs, we propose a novel approach of LinguisticGuidance for Seq2seq AMR parsing (LGSA). Our proposed LGSA incorporates the very limitedinformation of various linguistic representation formats as guidance on the Encoder side, whichcan effectively enhance PLMs to their further potential, and boost AMR parsing. The resultson proverbial benchmark AMR2.0 and AMR3.0 demonstrate the efficacy of LGSA, which canimprove seq2seq AMR parsers without silver AMR data or alignment information. Moreover,we evaluate the generalization of LGSA by conducting experiments on out-of-domain datasets,and the results indicate that LGSA is even effective in such challenging scenarios.”

pdf bib
Automatic Construction of the English Sentence Pattern Structure Treebank for Chinese ESL learners
Zhu Lin | Xu Meng | Guo Wenya | Yu Jingsi | Yang Liner | Cao Zehuang | Huang Yuan | Yang Erhong

“Analyzing long and complicated sentences has always been a priority and challenge in Englishlearning. In order to conduct the parse of these sentences for Chinese English as Second Lan-guage (ESL) learners, we design the English Sentence Pattern Structure (ESPS) based on theSentence Diagramming theory. Then, we automatically construct the English Sentence PatternStructure Treebank (ESPST) through the method of rule conversion based on constituency struc-ture and evaluate the conversion results. In addition, we set up two comparative experiments,using trained parser and large language models (LLMs). The results prove that the rule-basedconversion approach is effective.”

pdf bib
Cost-efficient Crowdsourcing for Span-based Sequence Labeling:Worker Selection and Data Augmentation
Wang Yujie | Huang Chao | Yang Liner | Fang Zhixuan | Huang Yaping | Liu Yang | Yu Jingsi | Yang Erhong

“This paper introduces a novel crowdsourcing worker selection algorithm, enhancing annotationquality and reducing costs. Unlike previous studies targeting simpler tasks, this study con-tends with the complexities of label interdependencies in sequence labeling. The proposedalgorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selec-tion, and a cost-effective human feedback mechanism. The challenge of dealing with imbal-anced and small-scale datasets, which hinders offline simulation of worker selection, is tack-led using an innovative data augmentation method termed shifting, expanding, and shrink-ing (SES). Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased thealgorithm’s efficiency, with an increase in F1 score up to 100.04% of the expert-only base-line, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independenttest emulating annotation evaluation through a Bernoulli distribution, which still led to animpressive 97.56% F1 score of the expert baseline and 59.88% cost savings. Furthermore,our approach can be seamlessly integrated into Reinforcement Learning from Human Feed-back (RLHF) systems, offering a cost-effective solution for obtaining human feedback. All re-sources, including source code and datasets, are available to the broader research community athttps://github.com/blcuicall/nlp-crowdsourcing.”

pdf bib
DLUE: Benchmarking Document Language Understanding
Xu Ruoxi | Lin Hongyu | Guan Xinyan | Sun Yingfei | Sun Le

“Understanding documents is central to many real-world tasks but remains a challenging topic.Unfortunately, there is no well-established consensus on how to comprehensively evaluate docu-ment understanding abilities, which significantly hinders the fair comparison and measuring theprogress of the field. To benchmark document understanding researches, this paper summarizesfour representative abilities, i.e., document classification, document structural analysis, docu-ment information extraction, and document transcription. Under the new evaluation framework,we propose Document Language Understanding Evaluation – DLUE, a new task suite whichcovers a wide-range of tasks in various forms, domains and document genres. We also systemat-ically evaluate six well-established transformer models and representative LLMs on DLUE, andfind that due to the lengthy content, complicated underlying structure and dispersed knowledge,document understanding is still far from being solved in complex real-world scenarios.”

pdf bib
Do Large Language Models Understand Conversational Implicature- A case study with a Chinese sitcom
Yue Shisen | Song Siyuan | Cheng Xinyuan | Hu Hai

“Understanding the non-literal meaning of an utterance is critical for large language models(LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp,the first Chinese multi-turn-dialogue-based dataset aimed at conversational implicature, sourcedfrom dialogues in the Chinese sitcom My Own Swordsman. It includes 200 carefully handcraftedquestions, all annotated on which Gricean maxims have been violated. We test eight close-sourceand open-source LLMs under two tasks: a multiple-choice question task and an implicature ex-planation task. Our results show that GPT-4 attains human-level accuracy (94%) on multiple-choice questions. CausalLM demonstrates a 78.5% accuracy following GPT-4. Other models,including GPT3.5 and several open-source models, demonstrate a lower accuracy ranging from20% to 60% on multiple-choice questions. Human raters were asked to rate the explanation ofthe implicatures generated by LLMs on their reasonability, logic and fluency. While all mod-els generate largely fluent and self-consistent text, their explanations score low on reasonabilityexcept for GPT-4, suggesting that most LLMs cannot produce satisfactory explanations of theimplicatures in the conversation. Moreover, we find LLMs’ performance does not vary signif-icantly by Gricean maxims, suggesting that LLMs do not seem to process implicatures derivedfrom different maxims differently. Our data and code are available at https://github.com/sjtu-compling/llm-pragmatics.”

pdf bib
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
Zhao Yan | Yi Jiangyan | Tao Jianhua | Wang Chenglong | Dong Yongfeng

“To enhance the effectiveness of fake audio detection techniques, researchers have developed mul-tiple datasets such as those for the ASVspoof and ADD challenges. These datasets typically focuson capturing non-emotional characteristics in speech, such as the identity of the speaker and theauthenticity of the content. However, they often overlook changes in the emotional state of theaudio, which is another crucial dimension affecting the authenticity of speech. Therefore, thisstudy reports our progress in developing such an emotion fake audio detection dataset involvingchanging emotion state of the origin audio named EmoFake. The audio samples in EmoFake aregenerated using open-source emotional voice conversion models, intended to simulate potentialemotional tampering scenarios in real-world settings. We conducted a series of benchmark ex-periments on this dataset, and the results show that even advanced fake audio detection modelstrained on the ASVspoof 2019 LA dataset and the ADD 2022 track 3.2 dataset face challengeswith EmoFake. The EmoFake is publicly available1 now.”

pdf bib
Going Beyond Passages: Readability Assessment for Book-level Long Texts
Li Wenbiao | Sun Rui | Zhang Tianyi | Wu Yunfang

“Readability assessment for book-level long text is widely needed in real educational applica-tions. However, most of the current researches focus on passage-level readability assessmentand little work has been done to process ultra-long texts. In order to process the long sequenceof book texts better and to enhance pretrained models with difficulty knowledge, we propose anovel model DSDR, difficulty-aware segment pre-training and difficulty multi-view representa-tion. Specifically, we split all books into multiple fixed-length segments and employ unsuper-vised clustering to obtain difficulty-aware segments, which are used to re-train the pretrainedmodel to learn difficulty knowledge. Accordingly, a long text is represented by averaging mul-tiple vectors of segments with varying difficulty levels. We construct a new dataset of GradedChildren’s Books to evaluate model performance. Our proposed model achieves promising re-sults, outperforming both the traditional SVM classifier and several popular pretrained models.In addition, our work establishes a new prototype for book-level readability assessment, whichprovides an important benchmark for related research in future work.”

pdf bib
Mitigating the Bias of Large Language Model Evaluation
Zhou Hongli | Huang Hui | Long Yunfei | Xu Bing | Zhu Conghui | Cao Hailong | Yang Muyun | Zhao Tiejun

“Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in theflavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output qual-ity. However, existing judges are proven to be biased, namely they would favor answers whichpresent better superficial quality (such as verbosity, fluency) while ignoring the instruction fol-lowing ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge.Specifically, for closed-source judge models, we apply calibration to mitigate the significance ofsuperficial quality, both on probability level and prompt level. For open-source judge models, wepropose to mitigate the bias by contrastive training, with curated negative samples that deviatefrom instruction but present better superficial quality. We apply our methods on the bias evalu-ation benchmark, and experiment results show our methods mitigate the bias by a large marginwhile maintaining a satisfactory evaluation accuracy.”

pdf bib
PPDAC: A Plug-and -Play Data Augmentation Component for Few-shot Extractive Question Answering
Huang Qi | Fu Han | Luo Wenbin | Wang Mingwen | Luo Kaiwei

“Extractive Question Answering (EQA) in the few-shot learning scenario is one of the most chal-lenging tasks of Machine Reading Comprehension (MRC). Some previous works employ exter-nal knowledge for data augmentation to improve the performance of few-shot extractive ques-tion answering. However, there are not always available external knowledge or language- anddomain-specific NLP tools to deal with external knowledge such as part-of-speech taggers, syn-tactic parsers, and named-entity recognizers. In this paper, we present a novel Plug-and-PlayData Augmentation Component (PPDAC) for the few-shot extractive question answering, whichincludes a paraphrase generator and a paraphrase selector. Specifically, we generate multipleparaphrases of the question in the (question, passage, answer) triples using the paraphrase gener-ator and then obtain highly similar statements via paraphrase selector to form more training datafor fine-tuning. Extensive experiments on multiple EQA datasets show that our proposed plug-and-play data augmentation component significantly improves question-answering performance,and consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.”

pdf bib
Sentence-Space Metrics (SSM) for the Evaluation of Sentence Comprehension
Lin Jieyu | Chen Honghua | Ding Nai

“It is a fundamental challenge to evaluate whether a model can truly capture the meaning ofsentences. Evaluation of whether a model well captures the meaning of individual words, how-ever, can be effectively achieved by analyzing whether the model encodes words in a vectorspace where semantically similar words form clusters. Inspired by this approach, we propose theSentence-Space Metrics (SSM) to evaluate model interpretation of sentences, and the sentencespace is constructed based on the pairwise entailment relationships between all sentence pairswithin a sentence pool. We use three metrics to evaluate a sentence space, i.e., (1) sparsity, (2)clustering of related sentences, and (3) similarity with the sentence space measured from hu-mans. The SSM is applied to evaluate 20 models, including ChatGPT, 18 BERT-family modelsfine-tuned for Natural Language Inference (NLI) task, as well as SimCSE, a sentence representa-tion model. The SSM reveals dramatic differences among models: Although all models achievehigh accuracy on standard NLI datasets such as MNLI, none of them mirrors the human behaviorunder the SSM. These results demonstrate that, compared with traditional accuracy measures,the SSM considers pairwise relationships between hundreds of sentences and therefore providea more fine-grained evaluation of model interpretation of sentences.Introduction”

pdf bib
AuditWen: An Open-Source Large Language Model for Audit
Huang Jiajia | Zhu Haoran | Xu Chao | Zhan Tianming | Xie Qianqian | Huang Jimin

“Intelligent auditing represents a crucial advancement in modern audit practices, enhancing boththe quality and efficiency of audits within the realm of artificial intelligence. With the rise oflarge language model (LLM), there is enormous potential for intelligent models to contribute toaudit domain. However, general LLMs applied in audit domain face the challenges of lackingspecialized knowledge and the presence of data biases. To overcome these challenges, this studyintroduces AuditWen, an open-source audit LLM by fine-tuning Qwen with constructing instruc-tion data from audit domain. We first outline the application scenarios for LLMs in the audit andextract requirements that shape the development of LLMs tailored for audit purposes. We thenpropose an audit LLM, called AuditWen, by fine-tuning Qwen with constructing 30k instructiondataset from 15 audit tasks and 3 layers. In evaluation stage, we proposed a benchmark with 5kinstructions that covers a set of critical audit tasks derived from the application scenarios. Withthe benchmark, we compare AuditWen with other existing LLMs from information extraction,question answering and document generation. The experimental results demonstrate superiorperformance of AuditWen both in question understanding and answer generation, making it animmediately valuable tool for audit.Keyword AuditWen, LLM, instruction dataset, fine-tuning, benchmarkIntroduction”

pdf bib
Chinese Grammatical Error Correction via Large Language Model Guided Optimization Training
Liu Xiao | Li Ying | Yu Zhengtao

“Pre-trained language model-based methods for Chinese Grammatical Error Correction (CGEC)are categorized into Seq2Seq and Seq2Edit types. However, both Seq2Seq and Seq2Edit mod-els depend on high-quality training data significantly. Considering the strong generation andinference ability of large language models (LLMs), we propose a large language model-guidedoptimization training method to exploit LLMs to extract error knowledge to optimize the tradi-tional CGEC model training process. On the one hand, we use error types and confusion sets asextra knowledge to guide LLMs to generate diverse pseudo data, thus extending the error distri-bution of our training data. On the other hand, LLMs are utilized to infer the predicted resultsfrom our CGEC models and obtain the re-training data, thus iteratively optimizing our pre-trainedCGEC models. Experiments on two benchmark datasets show that our LLMs-guided optimiza-tion method with small-scale training data can achieve comparable results with baseline modelswith large-scale training data. Detailed comparison experiments demonstrate that both the earlydeviser pseudo data and the later re-training data are extremely useful for traditional CGEC modeloptimization training, and can benefit from each other. We will release our code and prompts athttps://github.com/SakuraAcedia/llm-cgec-got to facilitate future work.”

pdf bib
Pattern Shifting or Knowledge Losing? A Forgetting Perspective for Understanding the Effect of Instruction Fine-Tuning
Zhang Chunkang | Cao Boxi | Lu Yaojie | Lin Hongyu | Cao Liu | Zeng Ke | Wan Guanglu | Cai Xunliang | Han Xianpei | Sun Le

“Instruction Fine-Tuning(IFT) emerges as an essential step of training large language models torobustly carry out tasks of interest. However, there lacks a systematic investigation about theunderlying mechanisms of instruction fine-tuning, particularly on the forgetting phenomenonafter IFT, known as alignment tax. Therefore, to understand the mechanism of IFT from theforgetting perspective, we investigate the alternation of the text pattern and knowledge withinmodels throughout the entire IFT process. Specifically, we restore fine-tuned models to their baseversion by training them on the data sharing a similar distribution with the pre-training corpusand compare their results Our experiment indicates that there is a stage transition of forgettingduring IFT process: (1) Pseudo Forgetting: in this stage, models mainly shift their familiar textpattern away from pre-training data format while the world knowledge is preserved. Consequently,models will recover to their original performance when they are restored to the base version. (2)Actual Forgetting: in this stage, models forget the acquired knowledge as well. Therefore, theyfail to reach the original performance even if they are restored to the base version.”

pdf bib
Prior Constraints-based Reward Model Training for Aligning Large Language Models
Zhou Hang | Wang Chenglong | Hu Yimin | Xiao Tong | Zhang Chunliang | Zhu Jingbo

“Reinforcement learning with human feedback for aligning large language models (LLMs) trainsa reward model typically using ranking loss with comparison pairs. However, the training pro-cedure suffers from an inherent problem: the uncontrolled scaling of reward scores during rein-forcement learning due to the lack of constraints while training the reward model. This paperproposes a Prior Constraints-based Reward Model (PCRM) training method to mitigate thisproblem. PCRM incorporates prior constraints—specifically, length ratio and cosine similaritybetween outputs of each comparison pair—during reward model training to regulate optimiza-tion magnitude and control score margins. We comprehensively evaluate PCRM by examining itsrank correlation with human preferences and its effectiveness in aligning LLMs via RL. Exper-imental results demonstrate that PCRM significantly improves alignment performance by effec-tively constraining reward score scaling. As another bonus, our method is easily integrated intoarbitrary rank-based alignment methods, such as direct preference optimization, and can yieldconsistent improvement. The code is available at https://github.com/wangclnlp/DeepSpeed-Chat-Extension/tree/PCRM.”

pdf bib
Prompt Engineering 101 Prompt Engineering Guidelines from a Linguistic Perspective
Han Wenjuan | Wei Xiang | Cui Xingyu | Cheng Ning | Jiang Guangyuan | Qian Weinan | Zhang Chi

“Deploying tuning-free prompting is challenging in engineering practice: it not only requiresusers to engage in cumbersome trials and errors but is also extremely time-consuming,as even a slight change in wording and phrasing could have a huge impact on the finalperformance. To further investigate the impact of different prompts, in this work, weperform a systematic inspection of four factors in linguistics involved in prompt engineering:syntax, semantics, lexicon, and pragmatics. The empirical results quantify the sensitivityof the output to small textual perturbations in four linguistic factors of prompts. Basedon the analysis of these four factors, we present a series of design guidelines to helphuman users write effective prompts. Human evaluation on amateurs shows that usingthe proposed guidelines helps humans produce prompts with significant gains in zero-shotperformance in Pre-trained Language Models (PLMs) and hence validates the utility ofthe guidelines.”