基于推理链的多跳问答对抗攻击和对抗增强训练方法(Reasoning Chain Based Adversarial Attack and Adversarial Augmentation Training for Multi-hop Question Answering)
Jiayu Ding (佳玙丁,) | Siyuan Wang (王思远) | Zhongyu Wei (魏忠钰) | Qin Chen (陈琴) | Xuanjing Huang (黄萱菁)


基于不完全标注的自监督多标签文本分类(Self-Training With Incomplete Labeling For Multi-Label Text Classification)
Junfei Ren (任俊飞) | Tong Zhu (朱桐) | Wenliang Chen (陈文亮)

“多标签文本分类((Multi-Label Text Classification, MLTC)旨在从预定义的候选标签集合中选择一个或多个文本对应的类别,是自然语言处理C)旨在从预定义的候选标签集合中选择一个或多个文本对应的类别,是自然语言处理(Natural Language Processing,NLP)的一项基本任务。前人工作大多基于规范且全面的标注数据集,而这些规范数据集需要严格的质量控制,一般很难获取。在真实的标注过程中,难免会丢失掉一些相关标签,进而导致不完全标注问题。为此本文提出了一种基于局部标注的自监督框架(Partial Self-Training,PST),该框架利用教师模型自动地给大规模无标注数据打伪标签,同时给不完全标注数据补充缺失标签,最后再利用这些数据反向更新教师模型。在合成数据集和真实数据集上的实验表明,本文提出的PST框架兼容现有的各类多标签文本分类模型,并且可以缓解不完全标注数据对模型的影响。”

融合汉越关联关系的多语言事件观点对象识别方法(A Multilingual Event Opinion Target Recognition Method Incorporating Chinese and Vietnamese Association Relations)
Gege Li (李格格) | Junjun Guo (郭军军) | Zhengtao Xu (余正涛) | Yan Xiang (相艳)


基于网络词典的现代汉语词义消歧数据集构建(Construction of a Modern Chinese Word Sense Dataset Based on Online Dictionaries)
Fukang Yan (严福康) | Yue Zhang (章岳) | Zhenghua Li (李正华)


基于多意图融合框架的联合意图识别和槽填充(A Multi-Intent Fusion Framework for Joint Intent Detection and Slot Filling)
Shangjian Yin (尹商鉴) | Peijie Huang (黄沛杰) | Dongzhu Liang (梁栋柱) | Zhuoqi He (何卓棋) | Qianer Li (黎倩尔) | Yuhong Xu (徐禹洪)


基于词频效应控制的神经机器翻译用词多样性增强方法(Improving Word-level Diversity in Neural Machine Translation by Controlling the Effects of Word Frequency)
Xuewen Shi (史学文) | Ping Jian (鉴萍) | Yikun Tang (唐翼琨) | Heyan HUang (黄河燕)

“通过最大似然估计优化的神经机器翻译(NMT)容易出现不可最大化的标记或低频词精度差等问题,这会导致生成的翻译缺乏词级别的多样性。词频在训练数据上的不均衡分布是造成上述现象的原因之一。本文旨在通过限制词频对 NMT 解码时估计概率的影响来缓解上述问题。具体地,我们采用了基于因果推断理论的半同胞回归去噪框架,并结合本文提出的自适应去噪系数来控制词频对模型估计概率的影响,以获得更准确的模型估计概率,并丰富 NMT 译文用词的多样性。本文的实验在四个代表不同资源规模的翻译任务上进行,分别是维吾尔语-汉语、汉语-英语、英语-德语和英语-法语。实验结果表明,本文所提出的方法在提升 NMT 译文词级别多样性的同时,不会损害译文的质量。另外,本文提出的方法还具有模型无关、可解释性强等优点。”

基于语音文本跨模态表征对齐的端到端语音翻译(End-to-end Speech Translation Based on Cross-modal Representation Alignment of Speech and Text)
Ling Zhou, Guojiang ang Dong | Zhengtao Yu | Shengxiang Gao | Wenjun Wang | Houli Ma | 国江 周 | 凌 董 | 正涛 余 | 盛祥 高 | 文君 王 | 候丽 马


基于离散化自监督表征增强的老挝语非自回归语音合成方法(A Discretized Self-Supervised Representation Enhancement based Non-Autoregressive Speech Synthesis Method for Lao Language)
Zijian Feng (冯子健) | Linqin Wang (王琳钦) | Shengxaing Gao (高盛祥) | Zhengtao Yu (余正涛) | Ling Dong (董凌)


面向机器翻译的汉英小句复合体转换生成能力调查(Investigation of the Clause Complexes Transfer and Generation Capability from Chinese to English for Machine Translation)
Fukun Xing (邢富坤) | Jianing Xu (徐佳宁)


基于端到端预训练模型的藏文生成式文本摘要(Abstractive Summarization of Tibetan Based on end-to-end Pre-trained Model)
Shuo Huang (黄硕) | Xiaodong Yan (闫晓东) | Xinpeng OuYang (欧阳新鹏) | Jinpeng Yang (杨金鹏)

“近年来,预训练语言模型受到了广泛的关注,这些模型极大地促进了自然语言处理在不同下游任务中的应用。文本摘要作为自然语言处理中的一个重要分支,可以有效的减少冗余信息,从而提高浏览文本速度。藏文作为低资源语言,缺乏用于大规模的训练语料,藏文生成式文本摘要研究还处于起步阶段,为了解决藏文生成式文本摘要的问题,本文首次提出将端到端的预训练语言模型CMPT(Chinese Minority Pre-Trained Language Model)用于藏文生成式文本摘要研究,CMPT模型通过对其他不同低资源语言文本进行去噪和对比学习,同时为了提高编码器的理解能力,在编码器的输出层增加一个单层掩码语言模型(MLM)解码器,进行Seq2Seq的生成和理解的联合预训练。通过进一步微调可以有效地提高在藏文文本摘要任务上的性能。为了验证模型的性能,我们在自己构建的5w条藏文文本摘要数据集和公开数据集Ti-SUM上进行实验,在两个数据集上的实验表明,我们提出的方法在藏文生成式文本摘要的评测指标上有显著提升。同时,该方法不仅可以应用于藏文文本摘要任务,也可以拓展到其他语言的文本摘要任务中,具有较好的推广价值。”

融合多粒度特征的缅甸语文本图像识别方法(Burmese Language Recognition Method Fused with Multi-Granularity Features)
Enyu He (何恩宇) | Rui Chen (陈蕊) | Cunli Mao (毛存礼) | Yuxin Huang (黄于欣) | Shengxaing Gao (高盛祥) | Zhengtao Yu (余正涛)

“缅甸语属于东南亚低资源语言,缅甸语文本图像识别对开展缅甸语机器翻译等任务具有重要意义。由于缅甸语属于典型的字符组合型语言,一个感受野内存在多个字符嵌套,现有缅甸语识别方法主要是从字符粒度进行识别,在解码时会出现某些字符未能正确识别而导致局部乱码。考虑到缅甸语存在特殊的字符组合规则,本文提出了一种融合多粒度特征的缅甸语文本图像识别方法,将较细粒度的字符粒度和较粗粒度的字符簇粒度进行序列建模,然后将两种粒度特征序列进行融合后利用解码器进行解码。实验结果表明,该方法能够有效缓解识别结果乱码的现象,并且在人工构建的数据集上相比“VGG16+BiLSTM+Transformer”的基线模型识别准确率提高2.4%,达到97.35%。 "

TiKEM:基于知识增强的藏文预训练语言模型(TiKEM: Knowledge Enhanced Tibetan Pre-trained Language Model)
Junjie Deng (邓俊杰) | Long Chen (陈龙) | Yan Zhang (张廷) | YUan Sun (孙媛) | Xiaobin Zhao (赵小兵)


TiKG-30K:基于表示学习的藏语知识图谱数据集(TiKG-30K: A Tibetan Knowledge Graph Dataset Based on Representation Learning)
Wenhao Zhuang (庄文浩) | Ge Gao (高歌) | Yuan Sun (孙媛)


噪声鲁棒的蒙古语语音数据增广模型结构(Noise robust Mongolian speech data augmentation model structure)
Zhiqaing Ma (马志强) | Jiaqi Sun (孙佳琦) | Jinyi Li (李晋益) | Jiatai Wang (王嘉泰)


基于数据增强的藏文机器阅读有难度问题的生成(Difficult Question Generation of Tibetan Machine Reading Based on Data Enhancement)
Zhengcuo Dan (旦正错) | Long Chen (陈龙) | Junjie Deng (邓俊杰) | Xian Pang (庞仙) | Yuan Sun (孙媛)


融合预训练模型的端到端语音命名实体识别(End-to-End Speech Named Entity Recognition with Pretrained Models)
Tianwei Lan (兰天伟) | Yuhang Guo (郭宇航)

“语音命名实体识别(Speech Named Entity Recognition, SNER)旨在从音频中识别出语音中命名实体的边界、种类和内容,是口语理解中的重要任务之一。直接从语音中识别出命名实体,即端到端方法是SNER目前的主流方法。但是语音命名实体识别的训练语料较少,端到端模型存在以下问题:(1)在跨领域识别的情况下模型的识别效果会有大幅度的下降。(2)模型在识别过程中会因同音词等现象对命名实体漏标、错标,进一步影响命名实体识别的准确性。针对问题(1),本文提出使用预训练实体识别模型构建语音实体识别的训练语料。针对问题(2),本文提出采用预训练语言模型对语音命名实体识别的N-BEST列表重打分,利用预训练模型中的外部知识帮助端到端模型挑选出最好的结果。为了验证模型的领域迁移能力,本文标注了少样本口语型数据集MAGICDATA-NER,在此数据上的实验表明,本文提出的方法相对于传统方法在F1值上有43.29%的提高。”

基于词向量的自适应领域术语抽取方法(An Adaptive Domain-Specific Terminology Extraction Approach Based on Word Embedding)
Xi Tang (唐溪) | Dongchen Jiang (蒋东辰) | Aoyuan Jiang (蒋翱远)


基于句法特征的事件要素抽取方法(Syntax-aware Event Argument Extraction )
Zijian Yu (余子健) | Tong Zhu (朱桐) | Wenliang Chen (陈文亮)

“事件要素抽取(Event Argument Extraction, EAE)旨在从非结构化文本中提取事件参与要素。编码器—解码器(Encoder-Decoder)框架是处理该任务的一种常见策略,此前的研究大多只向编码器端输入文本的字词信息,导致模型泛化和远程依赖处理能力较弱。为此,本文提出一种融入句法信息的事件要素抽取模型。首先对文本分析得到成分句法解析树,将词性标签和各节点的句法成分标签编码,增强模型的文本表征能力。然后,本文提出了一种基于树结构的注意力机制(Tree-Attention)辅助模型更好地感知结构化语义信息,提高模型处理远距离依赖的能力。实验结果表明,本文所提方法相较于基线系统F1值提升2.02%,证明该方法的有效性。”

相似音节增强的越汉跨语言实体消歧方法(Similar syllable enhanced cross-lingual entity disambiguation for Vietnamese-Chinese)
Yujuan Li (李裕娟) | Ran Song (宋燃) | Cunli Mao (毛存礼) | Yuxin Huang (黄于欣) | Shengxiang Gao (高盛祥) | Shan Lu (陆杉)


英汉动物词的认知属性计量研究(Quantitative studies of congnitive attributes of English and Chinese animal words)
Ling Hua (华玲) | Bin Li (李斌) | Minxuan Feng (冯敏萱) | Haibo Kuang (匡海波)


融合词典信息的古籍命名实体识别研究(A Study on the Recognition of Named Entities of Ancient Books Using Lexical Information)
Wenjun Kang (康文军) | Jiali Zuo (左家莉) | Anquan Jie (揭安全) | Wenbin Luo (罗文兵) | Mingwen Wang (王明文)


结合全局对应矩阵和相对位置信息的古汉语实体关系联合抽取(Joint Extraction of Ancient Chinese Entity Relations by Combining Global Correspondence Matrix and Relative Position Information)
Yiyu Hu (胡益裕) | Jiali Zuo (左家莉) | Xueqiang Ceng (曾雪强) | Zhongying Wan (万中英) | Mingwen Wang (王明文)


数字人文视域下的青藏高原文旅知识图谱构建研究——以塔尔寺为例(Research on the Construction of Cultural and Tourism Knowledge Atlas on the Qinghai-Tibet Plateau from the Perspective of Digital Humanity——A case study of Kumbum Monastery)
Xinhao Li (李鑫豪) | Weina Zhao (赵维纳) | Wanyi Zhao (赵婉亦) | Chaoqun Li (李超群)


基于互信息最大化和对比损失的多模态对话情绪识别模型(Multimodal Emotion Recognition in Conversation with Mutual Information Maximization and Contrastive Loss)
Qianer Li (黎倩尔) | Peijie Huang (黄沛杰) | Jiawei Chen (陈佳炜) | Jialin Wu (吴嘉林) | Yuhong Xu (徐禹洪) | Peiyuan Lin (林丕源)

“多模态的对话情绪识别(emotion recognition in conversation,ERC)是构建情感对话系统的关键。近年来基于图的融合方法在会话中动态聚合多模态上下文特征,提高了模型在多模态对话情绪识别方面的性能。然而,这些方法都没有充分保留和利用输入数据中的有价值的信息。具体地说,它们都没有保留从输入到融合结果的任务相关信息,并且忽略了标签本身蕴含的信息。本文提出了一种基于互信息最大化和对比损失的多模态对话情绪识别模型MMIC来解决上述的问题。模型通过在输入级和融合级上分级最大化模态之间的互信息(mutual information),使任务相关信息在融合过程中得以保存,从而生成更丰富的多模态表示。本文还在基于图的动态融合网络中引入了监督对比学习(supervised contrastive learning),通过充分利用标签蕴含的信息,使不同情绪相互排斥,增强了模型识别相似情绪的能力。在两个英文和一个中文的公共数据集上的大量实验证明了所提出模型的有效性和优越性。此外,在所提出模型上进行的案例探究有效地证实了模型可以有效保留任务相关信息,更好地区分出相似的情绪。消融实验和可视化结果证明了模型中每个模块的有效性。”

基于语义任务辅助的方面情感分析(Semantic Task-assisted Aspect-based Sentiment Analysis)
Zhaozhen Wu (吴肇真) | Hui Zhao (赵晖) | Tiquan Gu (谷体泉) | Guoyi Cao (曹国义)

“方面情感分析(Aspect-Based Sentiment Analysis,ABSA)任务旨在判断一句话中不同方面的细粒度情感极性。如何有效的捕获句子的语义信息是该任务的关键。现有的大多数分类方法通过引入外部知识并设计复杂的模块来理解句子的语义信息,而忽略了外部解析器的噪音及模型的复杂化。在本文中,我们提出了一种基于语义理解的多任务学习网络,它旨在通过多任务学习从原始语料中捕获句子的语义信息。本文考虑从多任务角度出发,在具有共享参数的原始数据集中,分别提出了两个语义辅助任务:方面上下文顺序预测任务和方面上下文句法依存预测任务。然后,将辅助任务与原始的方面情感分类任务进行多任务的训练得到增强了语义理解的编码器,最后将该编码器用于方面情感分类任务。实验结果表明,模型在三个主要的公开数据集Rest14、Lap14和Twitter上的准确率和Macro-F1值都有较好的表现。”

中国社会道德变化模型与发展动因探究——基于70年《人民日报》的计量与分析 (The Model of Moral Change and Motivation in Chinese Society ——The Vocabulary Analysis of the 70-year ”People’s Daily”)
Hongrui Wang (王弘睿) | Dong Yu (于东) | Pengyuan Liu (刘鹏远) | Liying Ceng (曾立英)


动词视角下的汉语性别表征研究——基于多语体语料库与依存分析(Gendered Representation in Chinese via Verbal Analysis —Based on a Multi-register Corpus and Dependency Parsing)
Yingshi Chen (陈颖诗) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)


基于多任务多模态交互学习的情感分类方法(Sentiment classification method based on multitasking and multimodal interactive learning)
Peng Xue (薛鹏) | Yang Li (李旸) | Suge Wang (王素格) | Jian Liao (廖健) | Jianxing Zheng (郑建兴) | Yujie Fu (符玉杰) | Deyu Li (李德玉)


基于动态常识推理与多维语义特征的幽默识别(Humor Recognition based on Dynamically Commonsense Reasoning and Multi-Dimensional Semantic Features)
Tuerxun Tunike | Hongfei Lin | Dongyu Zhang | Liang Yang | Changrong Min | 吐尔逊 吐妮可 | 鸿飞 林 | 冬瑜 张 | 亮 杨 | 昶荣 闵


融合Synonyms 词库的专利语义相似度计算研究(Patent Semantic Similarity Calculation by Fusing Synonyms Database)
Xinyu Tong (佟昕瑀) | Jialun Liao (廖佳伦) | Yonghe Lu (路永和)


中医临床切诊信息抽取与词法分析语料构建及联合建模方法(On Corpus Construction and Joint Modeling for Clinical Pulse Feeling and Palpation Information Extraction and Lexical Analysis of Traditional Chinese Medicine)
Yaqiang Wang (王亚强) | Wen Jiang (蒋文) | Yongguang Jiang (蒋永光) | Hongping Shu (舒红平)


大规模语言模型增强的中文篇章多维度阅读体验量化研究(Quantitative Research on Multi-dimensional Reading Experience of Chinese Texts Enhanced by Large Language Model)
Jiadai Sun (孙嘉黛) | Siyi Tang (汤思怡) | Shike Wang (王诗可) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)


融合文本困惑度特征和相似度特征的推特机器人检测方法∗(Twitter robot detection method based on text perplexity feature and similarity feature)
Zhongjie Wang (王钟杰) | ZZhaowen Zhang (张朝文) | Wenqi Ding (丁文琪) | Yumeng Fu (付雨濛) | Lili Shan (单丽莉) | Bingquan Liu (刘秉权)


差比句结构及其缺省现象的识别补全研究(A Study on Identification and Completion of Comparative Sentence Structures with Ellipsis Phenomenon)
Pengfei Zhou (周鹏飞) | Weiguang Qv (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)


基于框架语义场景图的零形式填充方法(A Null Instantiation Filling Method based Frame Semantic Scenario Graph)
Yuzhi Wang (王俞智) | Ru Li (李茹) | Xuefeng Su (苏雪峰) | Zhichao Yan (闫智超) | Juncai Li (李俊材)


基于FLAT的农业病虫害命名实体识别(Named Entity Recognition of Agricultural Pests and Diseases based on FLAT)
Yi Ren (任义) | Jie Shen (沈洁) | Shuai Yuan (袁帅)


基于结构树库的补语位形容词语义分析及搭配库构建∗(Semantic analysis of complementary adjectives and construction of collocation database based on structural tree library)
Tian Siyu (思雨 田) | Shao Tian (田 邵) | Xun Endong (恩东 荀) | Rao Gaoqi (高琦 饶)


基于BiLSTM聚合模型的汉语框架语义角色识别(Chinese Frame Semantic Role Identification Based on BiLSTM Aggregation Model)
Xuefei Cao (曹学飞) | Hongji Li (李济洪) | Ruibo Wang (王瑞波) | Qian Niu (牛倩)


L2到L1的跨语言激活路径研究——基于词汇识别的ERP数据(Cross-lingual Activation Path from L2 to L1——Based on ERP Data during Word Recognition)
Siqin Yang (杨思琴) | Minghu Jiang (江铭虎)

“跨语言词汇激活模型是当下语言认知与计算研究的热门话题。本研究运用事件相关电位技术(event-related potentials,ERPs)探索了二语学习者在识别二语(second language,简称L2)词汇时激活母语(native language,简称L1)词汇表征的路径。研究设计了隐性启动范式来开展两个实验,通过观察被试能否感知只有激活L1词汇表征才能发现的对译词重复情况这一隐性条件来推测激活结果。脑电结果显示,实验一的被试在执行语义判断任务时,对译词重复与否产生了显著的N400差异,这表明被试经由概念表征激活了L1词汇表征,进而证明了激活路径Path-1(L2>L1)的存在;实验二的被试在执行书写形式判断任务时,在没有语义启动的情况下,同样感知到了对译词这一隐性条件,这表明他们可以由L2词汇表征直接激活L1词汇表征,从而证明了激活路径子Path-2(L2>L1)的存在。总体而言,词汇识别过程中从L2词汇表征到L1词汇表征的激活路径与修正层次模型(the Revised Hierarchical Model,RHM)描绘的词汇产出过程的激活路径类似。据此,本研究推测,尽管大脑在词汇识别和词汇产生过程中采用不同的处理机制,但在跨语言词汇激活过程中,它们依然存在某些共通之处。”

汉语语义构词的资源建设与计算评估(Construction of Chinese Semantic Word-Formation and its Computing Applications)
Yue Wang (王悦) | Yang Liu (刘扬) | Qiliang Liang (梁启亮) | Hansi Wang (王涵思)


基于多尺度建模的端到端自动语音识别方法(An End-to-End Automatic Speech Recognition Method Based on Multiscale Modeling)
Hao Chen (陈昊) | Runlai Zhang (张润来) | Yuhao Zhang (张裕浩) | Chenghao Gao (高成浩) | Chen Xu (许晨) | Anxiang Ma (马安香) | Tong Xiao (肖桐) | Jingbo Zhu (朱靖波)


基于血缘关系结构的亲属关系推理算法研究(A Study on Kinship Inference Algorithm Based on Blood Relationship Structure)
Dawei Lu (卢达威) | Siqin Yang (杨思琴)


基于深加工语料库的《唐诗三百首》难度分级(The difficulty classification of ‘ Three Hundred Tang Poems ’ based on the deep processing corpus)
Yuyu Huang (黄宇宇) | Xinyu Chen (陈欣雨) | Minxuan Feng (冯敏萱) | Yunuo Wang (王禹诺) | Beiyuan Wang (蓓原王,) | Bin Li (李斌)


基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM)
Xiaojun Rao | Yangsen Zhang | Qilong Jia | Xueyang Liu | 晓俊 饶 | 仰森 张 | 爽 彭 | 启龙 贾 | 雪阳 刘


汉语被动结构解析及其在CAMR中的应用研究(Parsing of Passive Structure in Chinese and Its Application in CAMR)
Kang Hu (康胡,) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)


人工智能生成语言与人类语言对比研究——以ChatGPT为例(A Comparative Study of Language between Artificial Intelligence and Human: A Case Study of ChatGPT)
Zhu Junhui (君辉 朱) | Wang Mengyan (梦焰 王) | Yang Erhong (尔弘 杨) | Nie Jingran (锦燃 聂) | Wang Yujie (誉杰 王) | Yue Yan (岩 岳) | Yang Liner (麟儿 杨)


古汉语通假字资源库的构建及应用研究(The Construction and Application of an Ancient Chinese Language Resource on Tongjiazi)
Zhaoji Wang (王兆基) | Shirui Zhang (张诗睿) | Xuetao Zhang (张学涛) | Renfen Hu (胡韧奋)


SpaCE2022中文空间语义理解评测任务数据集分析报告(A Quality Assessment Report of the Chinese Spatial Cognition Evaluation Benchmark)
Xiao Liming (力铭 肖) | Sun Chunhui (春晖 孙) | Zhan Weidong (卫东 詹) | Xing Dan (丹 邢) | Li Nan (楠 李) | Wang Chengwen (诚文 王) | Zhu Fangwei (方韦 祝)


基于预训练语言模型的端到端概念体系构建方法(End to End Taxonomy Construction Method with Pretrained Language Model)
Wang Siyi (思懿 王) | He Shizhu (世柱 何) | Liu Kang (康 刘) | Zhao Jun (军 赵)


Ask to Understand: Question Generation for Multi-hop Question Answering
Li Jiawei | Ren Mucheng | Gao Yang | Yang Yizhe

“Multi-hop Question Answering (QA) requires the machine to answer complex questions by find-ing scattering clues and reasoning from multiple documents. Graph Network (GN) and Ques-tion Decomposition (QD) are two common approaches at present. The former uses the “black-box” reasoning process to capture the potential relationship between entities and sentences, thusachieving good performance. At the same time, the latter provides a clear reasoning logical routeby decomposing multi-hop questions into simple single-hop sub-questions. In this paper, wepropose a novel method to complete multi-hop QA from the perspective of Question Genera-tion (QG). Specifically, we carefully design an end-to-end QG module on the basis of a classicalQA module, which could help the model understand the context by asking inherently logicalsub-questions, thus inheriting interpretability from the QD-based method and showing superiorperformance. Experiments on the HotpotQA dataset demonstrate that the effectiveness of ourproposed QG module, human evaluation further clarifies its interpretability quantitatively, andthorough analysis shows that the QG module could generate better sub-questions than QD meth-ods in terms of fluency, consistency, and diversity.”

Learning on Structured Documents for Conditional Question Answering
Wang Zihan | Qian Hongjin | Dou Zhicheng

“Conditional question answering (CQA) is an important task in natural language processing thatinvolves answering questions that depend on specific conditions. CQA is crucial for domainsthat require the provision of personalized advice or making context-dependent analyses, such aslegal consulting and medical diagnosis. However, existing CQA models struggle with generatingmultiple conditional answers due to two main challenges: (1) the lack of supervised training datawith diverse conditions and corresponding answers, and (2) the difficulty to output in a complexformat that involves multiple conditions and answers. To address the challenge of limited super-vision, we propose LSD (Learning on Structured Documents), a self-supervised learning methodon structured documents for CQA. LSD involves a conditional problem generation method anda contrastive learning objective. The model is trained with LSD on massive unlabeled structureddocuments and is fine-tuned on labeled CQA dataset afterwards. To overcome the limitation ofoutputting answers with complex formats in CQA, we propose a pipeline that enables the gen-eration of multiple answers and conditions. Experimental results on the ConditionalQA datasetdemonstrate that LSD outperforms previous CQA models in terms of accuracy both in providinganswers and conditions.”

Overcoming Language Priors with Counterfactual Inference for Visual Question Answering
Ren Zhibo | Wang Huizhen | Zhu Muhua | Wang Yichao | Xiao Tong | Zhu Jingbo

“Recent years have seen a lot of efforts in attacking the issue of language priors in the field ofVisual Question Answering (VQA). Among the extensive efforts, causal inference is regarded asa promising direction to mitigate language bias by weakening the direct causal effect of questionson answers. In this paper, we follow the same direction and attack the issue of language priorsby incorporating counterfactual data. Moreover, we propose a two-stage training strategy whichis deemed to make better use of counterfactual data. Experiments on the widely used bench-mark VQA-CP v2 demonstrate the effectiveness of the proposed approach, which improves thebaseline by 21.21% and outperforms most of the previous systems.”

Rethinking Label Smoothing on Multi-hop Question Answering
Yin Zhangyue | Wang Yuxin | Hu Xiannian | Wu Yiguang | Yan Hang | Zhang Xinyu | Cao Zhao | Huang Xuanjing | Qiu Xipeng

“Multi-Hop Question Answering (MHQA) is a significant area in question answering, requiringmultiple reasoning components, including document retrieval, supporting sentence prediction,and answer span extraction. In this work, we present the first application of label smoothing tothe MHQA task, aiming to enhance generalization capabilities in MHQA systems while miti-gating overfitting of answer spans and reasoning paths in the training set. We introduce a novellabel smoothing technique, F1 Smoothing, which incorporates uncertainty into the learning pro-cess and is specifically tailored for Machine Reading Comprehension (MRC) tasks. Moreover,we employ a Linear Decay Label Smoothing Algorithm (LDLA) in conjunction with curricu-lum learning to progressively reduce uncertainty throughout the training process. Experimenton the HotpotQA dataset confirms the effectiveness of our approach in improving generaliza-tion and achieving significant improvements, leading to new state-of-the-art performance on theHotpotQA leaderboard.”

Improving Zero-shot Cross-lingual Dialogue State Tracking via Contrastive Learning
Xiang Yu | Zhang Ting | Di Hui | Huang Hui | Li Chunyou | Ouchi Kazushige | Chen Yufeng | Xu Jinan

“Recent works in dialogue state tracking (DST) focus on a handful of languages, as collectinglarge-scale manually annotated data in different languages is expensive. Existing models addressthis issue by code-switched data augmentation or intermediate fine-tuning of multilingual pre-trained models. However, these models can only perform implicit alignment across languages. In this paper, we propose a novel model named Contrastive Learning for Cross-Lingual DST(CLCL-DST) to enhance zero-shot cross-lingual adaptation. Specifically, we use a self-builtbilingual dictionary for lexical substitution to construct multilingual views of the same utterance. Then our approach leverages fine-grained contrastive learning to encourage representations ofspecific slot tokens in different views to be more similar than negative example pairs. By thismeans, CLCL-DST aligns similar words across languages into a more refined language-invariantspace. In addition, CLCL-DST uses a significance-based keyword extraction approach to selecttask-related words to build the bilingual dictionary for better cross-lingual positive examples. Experiment results on Multilingual WoZ 2.0 and parallel MultiWoZ 2.1 datasets show that ourproposed CLCL-DST outperforms existing state-of-the-art methods by a large margin, demon-strating the effectiveness of CLCL-DST.”

Unsupervised Style Transfer in News Headlines via Discrete Style Space
Liu Qianhui | Gao Yang | Yang Yizhe

“The goal of headline style transfer in this paper is to make a headline more attractive whilemaintaining its meaning. The absence of parallel training data is one of the main problems in thisfield. In this work, we design a discrete style space for unsupervised headline style transfer, shortfor D-HST. This model decomposes the style-dependent text generation into content-featureextraction and style modelling. Then, generation decoder receives input from content, style,and their mixing components. In particular, it is considered that textual style signal is moreabstract than the text itself. Therefore, we propose to model the style representation space asa discrete space, and each discrete point corresponds to a particular category of the styles thatcan be elicited by syntactic structure. Finally, we provide a new style-transfer dataset, namedas TechST, which focuses on transferring news headline into those that are more eye-catchingin technical social media. In the experiments, we develop two automatic evaluation metrics— style transfer rate (STR) and style-content trade-off (SCT) — along with a few traditionalcriteria to assess the overall effectiveness of the style transfer. In addition, the human evaluationis thoroughly conducted in terms of assessing the generation quality and creatively mimicking ascenario in which a user clicks on appealing headlines to determine the click-through rate. Ourresults indicate the D-HST achieves state-of-the-art results in these comprehensive evaluations. Introduction”

Lexical Complexity Controlled Sentence Generation for Language Learning
Nie Jinran | Yang Liner | Chen Yun | Kong Cunliang | Zhu Junhui | Yang Erhong

“Language teachers spend a lot of time developing good examples for language learners. For this reason, we define a new task for language learning, lexical complexity controlledsentence generation, which requires precise control over the lexical complexity in thekeywords to examples generation and better fluency and semantic consistency. The chal-lenge of this task is to generate fluent sentences only using words of given complexitylevels. We propose a simple but effective approach for this task based on complexityembedding while controlling sentence length and syntactic complexity at the decodingstage. Compared with potential solutions, our approach fuses the representations of theword complexity levels into the model to get better control of lexical complexity. Andwe demonstrate the feasibility of the approach for both training models from scratch andfine-tuning the pre-trained models. To facilitate the research, we develop two datasetsin English and Chinese respectively, on which extensive experiments are conducted. Ex-perimental results show that our approach provides more precise control over lexicalcomplexity, as well as better fluency and diversity.”

Dynamic-FACT: A Dynamic Framework for Adaptive Context-Aware Translation
Chen Linqing | Wang Weilei

“Document-level neural machine translation (NMT) has garnered considerable attention sincethe emergence of various context-aware NMT models. However, these static NMT models aretrained on fixed parallel datasets, thus lacking awareness of the target document during infer-ence. In order to alleviate this limitation, we propose a dynamic adapter-translator frameworkfor context-aware NMT, which adapts the trained NMT model to the input document prior totranslation. Specifically, the document adapter reconstructs the scrambled portion of the originaldocument from a deliberately corrupted version, thereby reducing the performance disparity be-tween training and inference. To achieve this, we employ an adaptation process in both the train-ing and inference stages. Our experimental results on document-level translation benchmarksdemonstrate significant enhancements in translation performance, underscoring the necessity ofdynamic adaptation for context-aware translation and the efficacy of our methodologies. Introduction”

TERL: Transformer Enhanced Reinforcement Learning for Relation Extraction
Wang Yashen | Shi Tuo | Ouyang Xiaoye | Guo Dayu

“Relation Extraction (RE) task aims to discover the semantic relation that holds between two entitiesand contributes to many applications such as knowledge graph construction and completion. Reinforcement Learning (RL) has been widely used for RE task and achieved SOTA results, whichare mainly designed with rewards to choose the optimal actions during the training procedure,to improve RE’s performance, especially for low-resource conditions. Recent work has shownthat offline or online RL can be flexibly formulated as a sequence understanding problem andsolved via approaches similar to large-scale pre-training language modeling. To strengthen theability for understanding the semantic signals interactions among the given text sequence, thispaper leverages Transformer architecture for RL-based RE methods, and proposes a genericframework called Transformer Enhanced RL (TERL) towards RE task. Unlike prior RL-basedRE approaches that usually fit value functions or compute policy gradients, TERL only outputsthe best actions by utilizing a masked Transformer. Experimental results show that the proposedTERL framework can improve many state-of-the-art RL-based RE methods.”

P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognitiong
Wang Zhuang | Zhang Yijia | An Kang | Zhou Xiaoying | Lu Mingyu | Lin Hongfei

“Multimodal Named Entity Recognition (MNER) is a challenging task in social mediadue to the combination of text and image features. Previous MNER work has focused onpredicting entity information after fusing visual and text features. However, pre-traininglanguage models have already acquired vast amounts of knowledge during their pre-training process. To leverage this knowledge, we propose a prompt network for MNERtasks (P-MNER).To minimize the noise generated by irrelevant areas in the image, wedesign a visual feature extraction model (FRR) based on FasterRCNN and ResNet, whichuses fine-grained visual features to assist MNER tasks. Moreover, we introduce a textcorrection fusion module (TCFM) into the model to address visual bias during modalfusion. We employ the idea of a residual network to modify the fused features using theoriginal text features. Our experiments on two benchmark datasets demonstrate that ourproposed model outperforms existing MNER methods. P-MNER’s ability to leveragepre-training knowledge from language models, incorporate fine-grained visual features,and correct for visual bias, makes it a promising approach for multimodal named entityrecognition in social media posts.”

Self Question-answering: Aspect Sentiment Triplet Extraction via a Multi-MRC Framework based on Rethink Mechanism
Zhang Fuyao | Zhang Yijia | Wang Mengyi | Yang Hong | Lu Mingyu | Yang Liang

“The purpose of Aspect Sentiment Triplet Extraction (ASTE) is to extract a triplet, including thetarget or aspect, its associated sentiment, and related opinion terms that explain the underlyingcause of the sentiment. Some recent studies fail to capture the strong interdependence betweenATE and OTE, while others fail to effectively introduce the relationship between aspects andopinions into sentiment classification tasks. To solve these problems, we construct a multi-roundmachine reading comprehension framework based on a rethink mechanism to solve ASTE tasksefficiently. The rethink mechanism allows the framework to model complex relationships be-tween entities, and exclusive classifiers and probability generation algorithms can reduce queryconflicts and unilateral drops in probability. Besides, the multi-round structure can fuse explicitsemantic information flow between aspect, opinion and sentiment. Extensive experiments showthat the proposed model achieves the most advanced effect and can be effectively applied toASTE tasks.”

Enhancing Ontology Knowledge for Domain-Specific Joint Entity and Relation Extraction
Xiong Xiong | Wang Chen | Liu Yunfei | Li Shengyang

“Pre-trained language models (PLMs) have been widely used in entity and relation extractionmethods in recent years. However, due to the semantic gap between general-domain text usedfor pre-training and domain-specific text, these methods encounter semantic redundancy anddomain semantics insufficiency when it comes to domain-specific tasks. To mitigate this issue,we propose a low-cost and effective knowledge-enhanced method to facilitate domain-specificsemantics modeling in joint entity and relation extraction. Precisely, we use ontology and entitytype descriptions as domain knowledge sources, which are encoded and incorporated into thedownstream entity and relation extraction model to improve its understanding of domain-specificinformation. We construct a dataset called SSUIE-RE for Chinese entity and relation extractionin space science and utilization domain of China Manned Space Engineering, which contains awealth of domain-specific knowledge. The experimental results on SSUIE-RE demonstrate theeffectiveness of our method, achieving a 1.4% absolute improvement in relation F1 score overprevious best approach. Introduction”

Document Information Extraction via Global Tagging
He Shaojie | Wang Tianshu | Lu Yaojie | Lin Hongyu | Han Xianpei | Sun Yingfei | Sun Le

“Document Information Extraction (DIE) is a crucial task for extracting key information fromvisually-rich documents. The typical pipeline approach for this task involves Optical Charac-ter Recognition (OCR), serializer, Semantic Entity Recognition (SER), and Relation Extraction(RE) modules. However, this pipeline presents significant challenges in real-world scenariosdue to issues such as unnatural text order and error propagation between different modules. Toaddress these challenges, we propose a novel tagging-based method – Global TaggeR (GTR),which converts the original sequence labeling task into a token relation classification task. Thisapproach globally links discontinuous semantic entities in complex layouts, and jointly extractsentities and relations from documents. In addition, we design a joint training loss and a jointdecoding strategy for SER and RE tasks based on GTR. Our experiments on multiple datasetsdemonstrate that GTR not only mitigates the issue of text in the wrong order but also improvesRE performance. Introduction”

A Distantly-Supervised Relation Extraction Method Based on Selective Gate and Noise Correction
Chen Zhuowei | Tian Yujia | Wang Lianxi | Jiang Shengyi

“Entity relation extraction, as a core task of information extraction, aims to predict the relation ofentity pairs identified by text, and its research results are applied to various fields. To addressthe problem that current distantly supervised relation extraction (DSRE) methods based on large-scale corpus annotation generate a large amount of noisy data, a DSRE method that incorporatesselective gate and noise correction framework is proposed. The selective gate is used to reason-ably select the sentence features in the sentence bag, while the noise correction is used to correctthe labels of small classes of samples that are misclassified into large classes during the modeltraining process, to reduce the negative impact of noisy data on relation extraction. The resultson the English datasets clearly demonstrate that our proposed method outperforms other base-line models. Moreover, the experimental results on the Chinese dataset indicate that our methodsurpasses other models, providing further evidence that our proposed method is both robust andeffective.”

Improving Cascade Decoding with Syntax-aware Aggregator and Contrastive Learning for Event Extraction
Sheng Zeyu | Liang Yuanyuan | Lan Yunshi

“Cascade decoding framework has shown superior performance on event extraction tasks. How-ever, it treats a sentence as a sequence and neglects the potential benefits of the syntactic struc-ture of sentences. In this paper, we improve cascade decoding with a novel module and a self-supervised task. Specifically, we propose a syntax-aware aggregator module to model the syntaxof a sentence based on cascade decoding framework such that it captures event dependencies aswell as syntactic information. Moreover, we design a type discrimination task to learn better syn-tactic representations of different event types, which could further boost the performance of eventextraction. Experimental results on two widely used event extraction datasets demonstrate thatour method could improve the original cascade decoding framework by up to 2.2% percentagepoints of F1 score and outperform a number of competitive baseline methods. Introduction”

Learnable Conjunction Enhanced Model for Chinese Sentiment Analysis
Zhao Bingfei | Zan Hongying | Wang Jiajia | Han Yingjie

“Sentiment analysis is a crucial text classification task that aims to extract, process, and analyzeopinions, sentiments, and subjectivity within texts. In current research on Chinese text, sentenceand aspect-based sentiment analysis is mainly tackled through well-designed models. However,despite the importance of word order and function words as essential means of semantic ex-pression in Chinese, they are often underutilized. This paper presents a new Chinese sentimentanalysis method that utilizes a Learnable Conjunctions Enhanced Model (LCEM). The LCEMadjusts the general structure of the pre-trained language model and incorporates conjunctionslocation information into the model’s fine-tuning process. Additionally, we discuss a variantstructure of residual connections to construct a residual structure that can learn critical informa-tion in the text and optimize it during training. We perform experiments on the public datasetsand demonstrate that our approach enhances performance on both sentence and aspect-basedsentiment analysis datasets compared to the baseline pre-trained language models. These resultsconfirm the effectiveness of our proposed method. Introduction”

Improving Affective Event Classification with Multi-Perspective Knowledge Injection
Yi Wenjia | Zhao Yanyan | Yuan Jianhua | Zhao Weixiang | Qin Bing

“In recent years, many researchers have recognized the importance of associating events withsentiments. Previous approaches focus on generalizing events and extracting sentimental in-formation from a large-scale corpus. However, since context is absent and sentiment is oftenimplicit in the event, these methods are limited in comprehending the semantics of the eventand capturing effective sentimental clues. In this work, we propose a novel Multi-perspectiveKnowledge-injected Interaction Network (MKIN) to fully understand the event and accuratelypredict its sentiment by injecting multi-perspective knowledge. Specifically, we leverage con-texts to provide sufficient semantic information and perform context modeling to capture thesemantic relationships between events and contexts. Moreover, we also introduce human emo-tional feedback and sentiment-related concepts to provide explicit sentimental clues from theperspective of human emotional state and word meaning, filling the reasoning gap in the senti-ment prediction process. Experimental results on the gold standard dataset show that our modelachieves better performance over the baseline models.”

Enhancing Implicit Sentiment Learning via the Incorporation of Part-of-Speech for Aspect-based Sentiment Analysis
Wang Junlang | Li Xia | He Junyi | Zheng Yongqiang | Ma Junteng

“Implicit sentiment modeling in aspect-based sentiment analysis is a challenging problem due tocomplex expressions and the lack of opinion words in sentences. Recent efforts focusing onimplicit sentiment in ABSA mostly leverage the dependency between aspects and pretrain onextra annotated corpora. We argue that linguistic knowledge can be incorporated into the modelto better learn implicit sentiment knowledge. In this paper, we propose a PLM-based, linguis-tically enhanced framework by incorporating Part-of-Speech (POS) for aspect-based sentimentanalysis. Specifically, we design an input template for PLMs that focuses on both aspect-relatedcontextualized features and POS-based linguistic features. By aligning with the representationsof the tokens and their POS sequences, the introduced knowledge is expected to guide the modelin learning implicit sentiment by capturing sentiment-related information. Moreover, we alsodesign an aspect-specific self-supervised contrastive learning strategy to optimize aspect-basedcontextualized representation construction and assist PLMs in concentrating on target aspects. Experimental results on public benchmarks show that our model can achieve competitive andstate-of-the-art performance without introducing extra annotated corpora.”

Case Retrieval for Legal Judgment Prediction in Legal Artificial Intelligence
Zhang Han | Dou Zhicheng

“Legal judgment prediction (LJP) is a basic task in legal artificial intelligence. It consists ofthree subtasks, which are relevant law article prediction, charge prediction and term of penaltyprediction, and gives the judgment results to assist the work of judges. In recent years, many deeplearning methods have emerged to improve the performance of the legal judgment prediction task. The previous methods mainly improve the performance by integrating law articles and the factdescription of a legal case. However, they rarely consider that the judges usually look up historicalcases before making a judgment in the actual scenario. To simulate this scenario, we propose ahistorical case retrieval framework for the legal judgment prediction task. Specifically, we selectsome historical cases which include all categories from the training dataset. Then, we retrieve themost similar Top-k historical cases of the current legal case and use the vector representation ofthese Top-k historical cases to help predict the judgment results. On two real-world legal datasets,our model achieves better results than several state-of-the-art baseline models.”

SentBench: Comprehensive Evaluation of Self-Supervised Sentence Representation with Benchmark Construction
Liu Xiaoming | Lin Hongyu | Han Xianpei | Sun Le

“Self-supervised learning has been widely used to learn effective sentence representations. Previ-ous evaluation of sentence representations mainly focuses on the limited combination of tasks andparadigms while failing to evaluate their effectiveness in a wider range of application scenarios. Such divergences prevent us from understanding the limitations of current sentence representa-tions, as well as the connections between learning approaches and downstream applications. Inthis paper, we propose SentBench, a new comprehensive benchmark to evaluate sentence repre-sentations. SentBench covers 12 kinds of tasks and evaluates sentence representations with threetypes of different downstream application paradigms. Based on SentBench, we re-evaluate sev-eral frequently used self-supervised sentence representation learning approaches. Experimentsshow that SentBench can effectively evaluate sentence representations from multiple perspec-tives, and the performance on SentBench leads to some novel findings which enlighten futureresearches.”

Adversarial Network with External Knowledge for Zero-Shot Stance Detection
Wang Chunling | Zhang Yijia | Yu Xingyu | Liu Guantong | Chen Fei | Lin Hongfei

“Zero-shot stance detection intends to detect previously unseen targets’ stances in the testingphase. However, achieving this goal can be difficult, as it requires minimizing the domain trans-fer between different targets, and improving the model’s inference and generalization abilities. To address this challenge, we propose an adversarial network with external knowledge (ANEK)model. Specifically, we adopt adversarial learning based on pre-trained models to learn transfer-able knowledge from the source targets, thereby enabling the model to generalize well to unseentargets. Additionally, we incorporate sentiment information and common sense knowledge intothe contextual representation to further enhance the model’s understanding. Experimental re-sults on several datasets reveal that our method achieves excellent performance, demonstratingits validity and feasibility.”

The Contextualized Representation of Collocation
Liu Daohuan | Tang Xuri

“Collocate list and collocation network are two widely used representation methods of colloca-tions, but they have significant weaknesses in representing contextual information. To solve thisproblem, we propose a new representation method, namely the contextualized representation ofcollocate (CRC), which highlights the importance of the position of the collocates and pins acollocate as the interaction of two dimensions: association strength and co-occurrence position. With a full image of all the collocates surrounding the node word, CRC carries the contextualinformation and makes the representation more informative and intuitive. Through three casestudies, i.e., synonym distinction, image analysis, and efficiency in lexical use, we demonstratethe advantages of CRC in practical applications. CRC is also a new quantitative tool to measurelexical usage pattern similarities for corpus-based research. It can provide a new representationframework for language researchers and learners.”

Training NLI Models Through Universal Adversarial Attack
Lin Jieyu | Liu Wei | Zou Jiajie | Ding Nai

“Pre-trained language models are sensitive to adversarial attacks, and recent works have demon-strated universal adversarial attacks that can apply input-agnostic perturbations to mislead mod-els. Here, we demonstrate that universal adversarial attacks can also be used to harden NLPmodels. Based on NLI task, we propose a simple universal adversarial attack that can misleadmodels to produce the same output for all premises by replacing the original hypothesis with anirrelevant string of words. To defend against this attack, we propose Training with UNiversalAdversarial Samples (TUNAS), which iteratively generates universal adversarial samples andutilizes them for fine-tuning. The method is tested on two datasets, i.e., MNLI and SNLI. It isdemonstrated that, TUNAS can reduce the mean success rate of the universal adversarial attackfrom above 79% to below 5%, while maintaining similar performance on the original datasets. Furthermore, TUNAS models are also more robust to the attack targeting at individual samples:When search for hypotheses that are best entailed by a premise, the hypotheses found by TUNASmodels are more compatible with the premise than those found by baseline models. In sum, weuse universal adversarial attack to yield more robust models. Introduction”

MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset
Shi Xiaorui

“Multimodal summarization which aims to generate summaries with multimodal inputs, e.g., textand visual features, has attracted much attention in the research community. However, previousstudies only focus on monolingual multimodal summarization and neglect the non-native readerto understand the cross-lingual news in practical applications. It inspires us to present a newtask, named Multimodal Cross-Lingual Summarization for news (MCLS), which generates cross-lingual summaries from multi-source information. To this end, we present a large-scale multimodalcross-lingual summarization dataset, which consists of 1.1 million article-summary pairs with 3.4million images in 44 * 43 language pairs. To generate a summary in any language, we propose aunified framework that jointly trains the multimodal monolingual and cross-lingual summarizationtasks, where a bi-directional knowledge distillation approach is designed to transfer knowledgebetween both tasks. Extensive experiments on many-to-many settings show the effectiveness ofthe proposed model.”

CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection
Wei Congcong | Feng Zhenbing | Huang Shutan | Li Wei | Shao Yanqiu

“Event detection (ED) is a crucial area of natural language processing that automates the extrac-tion of specific event types from large-scale text, and studying historical ED in classical Chinesetexts helps preserve and inherit historical and cultural heritage by extracting valuable informa-tion. However, classical Chinese language characteristics, such as ambiguous word classes andcomplex semantics, have posed challenges and led to a lack of datasets and limited research onevent schema construction. In addition, large-scale datasets in English and modern Chinese arenot directly applicable to historical ED in classical Chinese. To address these issues, we con-structed a logical event schema for classical Chinese historical texts and annotated the resultingdataset, which is called classical Chinese Historical Event Dataset (CHED). The main challengesin our work on classical Chinese historical ED are accurately identifying and classifying eventswithin cultural and linguistic contexts and addressing ambiguity resulting from multiple mean-ings of words in historical texts. Therefore, we have developed a set of annotation guidelinesand provided annotators with an objective reference translation. The average Kappa coefficientafter multiple cross-validation is 68.49%, indicating high quality and consistency. We conductedvarious tasks and comparative experiments on established baseline models for historical ED inclassical Chinese. The results showed that BERT+CRF had the best performance on sequencelabeling task, with an f1-score of 76.10%, indicating potential for further improvement. 1Introduction”

Revisiting k-NN for Fine-tuning Pre-trained Language Models
Li Lei | Chen Jing | Tian Botzhong | Zhang Ningyu

“Pre-trained Language Models (PLMs), as parametric-based eager learners, have become thede-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (k-NN) classifiers, as the lazy learning paradigm, tend to mitigate over-fittingand isolated noise. In this paper, we revisit k-NN classifiers for augmenting the PLMs-based clas-sifiers. From the methodological level, we propose to adopt k-NN with textual representationsof PLMs in two steps: (1) Utilize k-NN as prior knowledge to calibrate the training process.(2) Linearly interpolate the probability distribution predicted by k-NN with that of the PLMs’classifier. At the heart of our approach is the implementation of k-NN-calibrated training, whichtreats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experimentson fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings,respectively, across eight diverse end-tasks. We hope our exploration will encourage the commu-nity to revisit the power of classical methods for efficient NLP1.”

Adder Encoder for Pre-trained Language Model
Ding Jianbang | Zhang Suiyun | Li Linlin

“BERT, a pre-trained language model entirely based on attention, has proven to be highly per-formant for many natural language understanding tasks. However, pre-trained language mod-els (PLMs) are often computationally expensive and can hardly be implemented with limitedresources. To reduce energy burden, we introduce adder operations into the Transformer en-coder and propose a novel AdderBERT with powerful representation capability. Moreover, weadopt mapping-based distillation to further improve its energy efficiency with an assured perfor-mance. Empirical results demonstrate that AddderBERT6 achieves highly competitive perfor-mance against that of its teacher BERTBASE on the GLUE benchmark while obtaining a 4.9xreduction in energy consumption.”

FinBART: A Pre-trained Seq2seq Language Model for Chinese Financial Tasks
Dong Hongyuan | Che Wanxiang | He Xiaoyu | Zheng Guidong | Wen Junjie

“Pretrained language models are making a more profound impact on our lives than ever before. They exhibit promising performance on a variety of general domain Natural Language Process-ing (NLP) tasks. However, few work focuses on Chinese financial NLP tasks, which comprisea significant portion of social communication. To this end, we propose FinBART, a pretrainedseq2seq language model for Chinese financial communication tasks. Experiments show thatFinBART outperforms baseline models on a series of downstream tasks including text classifica-tion, sequence labeling and text generation. We further pretrain the model on customer servicecorpora, and results show that our model outperforms baseline models and achieves promisingperformance on various real world customer service text mining tasks.”

Exploring Accurate and Generic Simile Knowledge from Pre-trained Language Models
Zhou Shuhan | Ma Longxuan | Shao Yanqiu

“A simile is an important linguistic phenomenon in daily communication and an important taskin natural language processing (NLP). In recent years, pre-trained language models (PLMs) haveachieved great success in NLP since they learn generic knowledge from a large corpus. However,PLMs still have hallucination problems that they could generate unrealistic or context-unrelatedinformation.In this paper, we aim to explore more accurate simile knowledge from PLMs.To this end, we first fine-tune a single model to perform three main simile tasks (recognition,interpretation, and generation). In this way, the model gains a better understanding of the simileknowledge. However, this understanding may be limited by the distribution of the training data. To explore more generic simile knowledge from PLMs, we further add semantic dependencyfeatures in three tasks. The semantic dependency feature serves as a global signal and helpsthe model learn simile knowledge that can be applied to unseen domains. We test with seenand unseen domains after training. Automatic evaluations demonstrate that our method helps thePLMs to explore more accurate and generic simile knowledge for downstream tasks. Our methodof exploring more accurate knowledge is not only useful for simile study but also useful for otherNLP tasks leveraging knowledge from PLMs. Our code and data will be released on GitHub.”