Jiale Han

Also published as: JiaLe Han

2025

Explain-Analyze-Generate: A Sequential Multi-Agent Collaboration Method for Complex Reasoning
WenYuan Gu | JiaLe Han | HaoWen Wang | Xiang Li | Bo Cheng
Proceedings of the 31st International Conference on Computational Linguistics

Exploring effective collaboration among multiple large language models (LLMs) represents an active research direction, with multiagent debate (MAD) emerging as a popular approach. MAD involves LLMs independently generating responses and refining their own responses by incorporating feedback from other agents in a debate manner. However,empirical experiments reveal the suboptimal performance of MAD in complex reasoning scenarios. We attribute this to the potential misleading caused by peer agents with limited individual capabilities. To address this, we propose a novel sequential collaboration framework named Explain-Analyze-Generate(EAG). By decomposing complex tasks into essential subtasks and employing a pipeline approach, EAG enable agents provide constructive assistance to peers, ultimately yielding higher performance. We conduct experiments on the comprehensive complex language reasoning benchmark: BIG-Bench-Hard (BBH). Our method achieves the highest performance on 19 out of 23 tasks, with an average improvement of 8% across all tasks, and incurs lower costs compared to MAD, demonstrating its effectiveness and efficiency.

pdf bib abs

Video question answering (VideoQA) has recently gained considerable attention in the field of computer vision, aiming to generate answers rely on both linguistic and visual reasoning. However, existing methods often align visual or textual features directly with large language models, which limits the deep semantic association between modalities and hinders a comprehensive understanding of the interactions within spatial and temporal contexts, ultimately leading to sub-optimal reasoning performance. To address this issue, we propose a novel temporal-aware framework for multi-modal video question answering, dubbed VideoQA-TA, which enhances reasoning ability and accuracy of VideoQA by aligning videos and questions at fine-grained levels. Specifically, an effective Spatial-Temporal Attention mechanism (STA) is designed for video aggregation, transforming video features into spatial and temporal representations while attending to information at different levels. Furthermore, a Temporal Object Injection strategy (TOI) is proposed to align object-level and frame-level information within videos, which further improves the accuracy by injecting explicit temporal information. Experimental results on MSVD-QA, MSRVTT-QA, and ActivityNet-QA datasets demonstrate the superior performance of our proposed method compared with the current SOTAs, meanwhile, visualization analysis further verifies the effectiveness of incorporating temporal information to videos.

pdf bib abs

Adapting General-Purpose Embedding Models to Private Datasets Using Keyword-based Retrieval
Yubai Wei | Jiale Han | Yi Yang
Findings of the Association for Computational Linguistics: ACL 2025

Text embedding models play a cornerstone role in AI applications, such as retrieval-augmented generation (RAG). While general-purpose text embedding models demonstrate strong performance on generic retrieval benchmarks, their effectiveness diminishes when applied to private datasets (e.g., company-specific proprietary data), which often contain specialized terminology and lingo. In this work, we introduce BMEmbed, a novel method for adapting general-purpose text embedding models to private datasets. By leveraging the well-established keyword-based retrieval technique (BM25), we construct supervisory signals from the ranking of keyword-based retrieval results to facilitate model adaptation. We evaluate BMEmbed across a range of domains, datasets, and models, showing consistent improvements in retrieval performance. Moreover, we provide empirical insights into how BM25-based signals contribute to improving embeddings by fostering alignment and uniformity, highlighting the value of this approach in adapting models to domain-specific data. We release the source code for the research community.

2024

pdf bib abs

Neural Text-to-Speech (TTS) systems find broad applications in voice assistants, e-learning, and audiobook creation. The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis. Yet, the efficiency of multi-step sampling in Diffusion Models presents challenges. Efforts have been made to integrate GANs with DMs, speeding up inference by approximating denoising distributions, but this introduces issues with model convergence due to adversarial training. To overcome this, we introduce CM-TTS, a novel architecture grounded in consistency models (CMs). Drawing inspiration from continuous-time diffusion models, CM-TTS achieves top-quality speech synthesis in fewer steps without adversarial training or pre-trained model dependencies. We further design weighted samplers to incorporate different sampling positions into model training with dynamic probabilities, ensuring unbiased learning throughout the entire training process. We present a real-time mel-spectrogram generation consistency model, validated through comprehensive evaluations. Experimental results underscore CM-TTS’s superiority over existing single-step speech synthesis systems, representing a significant advancement in the field.

pdf bib abs

Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors
Shengkun Ma | Jiale Han | Yi Liang | Bo Cheng
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models to address the above two challenges, thereby making language models better continual few-shot relation extractors. Specifically, we propose a Contrastive Prompt Learning framework, which designs prompt representation to acquire more generalized knowledge that can be easily adapted to old and new categories, and margin-based contrastive learning to focus more on hard samples, therefore alleviating catastrophic forgetting and overfitting issues. To further remedy overfitting in low-resource scenarios, we introduce an effective memory augmentation strategy that employs well-crafted prompts to guide ChatGPT in generating diverse samples. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin and significantly mitigates catastrophic forgetting and overfitting in low-resource scenarios.

2022

pdf bib abs

Generative Prompt Tuning for Relation Classification
Jiale Han | Shuai Zhao | Bo Cheng | Shengkun Ma | Wei Lu
Findings of the Association for Computational Linguistics: EMNLP 2022

Using prompts to explore the knowledge contained within pre-trained language models for downstream tasks has now become an active topic. Current prompt tuning methods mostly convert the downstream tasks to masked language modeling problems by adding cloze-style phrases and mapping all labels to verbalizations with fixed length, which has proven effective for tasks with simple label spaces. However, when applied to relation classification exhibiting complex label spaces, vanilla prompt tuning methods may struggle with label verbalizations with arbitrary lengths due to rigid prompt restrictions. Inspired by the text infilling task for pre-training generative models that can flexibly predict missing spans, we propose a novel generative prompt tuning method to reformulate relation classification as an infilling problem, which frees our approach from limitations of current prompt based approaches and thus fully exploits rich semantics of entity and relation types. In addition, we design entity-guided decoding and discriminative relation scoring to generate and align relations effectively and efficiently during inference. Extensive experiments under fully supervised settings and low-resource settings demonstrate the effectiveness of our approach.

2021

pdf bib abs

Exploring Task Difficulty for Few-Shot Relation Extraction
Jiale Han | Bo Cheng | Wei Lu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Few-shot relation extraction (FSRE) focuses on recognizing novel relations by learning with merely a handful of annotated instances. Meta-learning has been widely adopted for such a task, which trains on randomly generated few-shot tasks to learn generic data representations. Despite impressive results achieved, existing models still perform suboptimally when handling hard FSRE tasks, where the relations are fine-grained and similar to each other. We argue this is largely because existing models do not distinguish hard tasks from easy ones in the learning process. In this paper, we introduce a novel approach based on contrastive learning that learns better representations by exploiting relation label information. We further design a method that allows the model to adaptively learn how to focus on hard tasks. Experiments on two standard datasets demonstrate the effectiveness of our method.

2020

pdf bib abs

The structural information of Knowledge Bases (KBs) has proven effective to Question Answering (QA). Previous studies rely on deep graph neural networks (GNNs) to capture rich structural information, which may not model node relations in particularly long distance due to oversmoothing issue. To address this challenge, we propose a novel framework GlobalGraph, which models long-distance node relations from two views: 1) Node type similarity: GlobalGraph assigns each node a global type label and models long-distance node relations through the global type label similarity; 2) Correlation between nodes and questions: we learn similarity scores between nodes and the question, and model long-distance node relations through the sum score of two nodes. We conduct extensive experiments on two widely used multi-hop KBQA datasets to prove the effectiveness of our method.

pdf bib abs

Open Domain Question Answering based on Text Enhanced Knowledge Graph with Hyperedge Infusion
Jiale Han | Bo Cheng | Xu Wang
Findings of the Association for Computational Linguistics: EMNLP 2020

The incompleteness of knowledge base (KB) is a vital factor limiting the performance of question answering (QA). This paper proposes a novel QA method by leveraging text information to enhance the incomplete KB. The model enriches the entity representation through semantic information contained in the text, and employs graph convolutional networks to update the entity status. Furthermore, to exploit the latent structural information of text, we treat the text as hyperedges connecting entities among it to complement the deficient relations in KB, and hypergraph convolutional networks are further applied to reason on the hypergraph-formed text. Extensive experiments on the WebQuestionsSP benchmark with different KB settings prove the effectiveness of our model.