Qi Huang

Also published as:


2025

"近年来,非自回归图像描述生成技术凭借其双向传播和并行词语生成的能力受到广泛关注。与此同时,基于离散扩散方法的研究也取得了显著进展。然而,在离散噪声添加与去噪过程中,现有方法仍面临图像文本关联性低、目标物体遗漏、描述准确性不足以及词语重复等关键问题。为应对这些挑战,我们提出一种基于语义感知的离散扩散模型。该模型通过可学习查询机制构建语义感知模块,以捕捉与图像物体级语义特征的潜在关联从而更好地生成图像描述。在此基础模型之上,我们进一步引入自提示优化框架,利用大语言模型生成与图像细节内容更相符的丰富描述。在COCO数据集上的综合实验表明,本方法在图像描述任务中取得一定的提升,其性能优于现有的相关方法。"
"知识图谱推理(KGR)旨在通过对知识图谱中蕴含的逻辑规则进行挖掘和应用,进而推断和发现新事实。该任务广泛应用于智能问答、语义搜索和推荐系统等领域。近年来,由于基于嵌入的知识图谱推理算法缺乏可解释性,一些研究者开始研究基于规则的知识图谱推理方法。然而,现有基于规则的推理方法在理解关系语义时难以处理关系之间的隐式关联信息且容易陷入局部最优解。为此,本文提出了一种基于关系结构感知增强的规则挖掘模型ReSA。该方法通过构建关系图,显式地建模关系之间的层次结构,提高规则挖掘的效率。同时,ReSA还通过全局规则融合模块和相对关系编码器,结合全局语义建模和局部结构建模,增强模型对规则体整体逻辑的感知能力。实验表明,ReSA模型在WN18RR等数据集上取得了显著的性能提升,MRR指标相较于现有最新规则挖掘方法提升了4个百分点。"

2024

“少样本命名实体识别旨在用少量的标注数据来识别命名实体。近年来受提示学习在少样本场景中表现良好性能的启发,本文探索了基于提示的少样本命名实体识别的方法。已有的基于提示学习的方法是通过列举所有可能的跨度来进行实体识别,这导致了计算成本高以及对实体边界信息未充分利用的问题。本文提出一种基于提示学习的两阶段框架TSP-Few,在不使用源域数据的情况下,进行少样本命名实体识别。第一阶段对种子跨度进行增强、过滤和扩展,其中种子增强模块能够让种子跨度捕获到更丰富的语义信息,种子过滤器能够减少大量的无关跨度,种子扩展模块能够充分利用实体的边界信息,为实体类型分类提供高质量的候选实体跨度。第二阶段利用提示学习方法预测候选跨度的相应类别。此外,为了缓解跨度检测阶段的错误累积,在实体分类阶段引入了负采样策略。跨度检测和实体类型分类任务的独立训练更容易在少样本情况下取得优异的性能。在三个基准数据集上的实验表明,与先进的方法相比,本文提出的方法在性能上有了进一步的提升,并且实验结果也表明了该文模型各个模块的有效性。”
“Extractive Question Answering (EQA) in the few-shot learning scenario is one of the most chal-lenging tasks of Machine Reading Comprehension (MRC). Some previous works employ exter-nal knowledge for data augmentation to improve the performance of few-shot extractive ques-tion answering. However, there are not always available external knowledge or language- anddomain-specific NLP tools to deal with external knowledge such as part-of-speech taggers, syn-tactic parsers, and named-entity recognizers. In this paper, we present a novel Plug-and-PlayData Augmentation Component (PPDAC) for the few-shot extractive question answering, whichincludes a paraphrase generator and a paraphrase selector. Specifically, we generate multipleparaphrases of the question in the (question, passage, answer) triples using the paraphrase gener-ator and then obtain highly similar statements via paraphrase selector to form more training datafor fine-tuning. Extensive experiments on multiple EQA datasets show that our proposed plug-and-play data augmentation component significantly improves question-answering performance,and consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.”

2023

Rumors spread rapidly through online social microblogs at a relatively low cost, causing substantial economic losses and negative consequences in our daily lives. Existing rumor detection models often neglect the underlying semantic coherence between text and image components in multimodal posts, as well as the challenges posed by incomplete modalities in single modal posts, such as missing text or images. This paper presents CLKD-IMRD, a novel framework for Incomplete Modality Rumor Detection. CLKD-IMRD employs Contrastive Learning and Knowledge Distillation to capture the semantic consistency between text and image pairs, while also enhancing model generalization to incomplete modalities within individual posts. Extensive experimental results demonstrate that our CLKD-IMRD outperforms state-of-the-art methods on two English and two Chinese benchmark datasets for rumor detection in social media.

2020

Systematic Generalization refers to a learning algorithm’s ability to extrapolate learned behavior to unseen situations that are distinct but semantically similar to its training data. As shown in recent work, state-of-the-art deep learning models fail dramatically even on tasks for which they are designed when the test set is systematically different from the training data. We hypothesize that explicitly modeling the relations between objects in their contexts while learning their representations will help achieve systematic generalization. Therefore, we propose a novel method that learns objects’ contextualized embeddings with dynamic message passing conditioned on the input natural language and end-to-end trainable with other downstream deep learning modules. To our knowledge, this model is the first one that significantly outperforms the provided baseline and reaches state-of-the-art performance on grounded SCAN (gSCAN), a grounded natural language navigation dataset designed to require systematic generalization in its test splits.