2024
pdf
bib
abs
360∘REA: Towards A Reusable Experience Accumulation with 360∘ Assessment for Multi-Agent System
Shen Gao
|
Hao Li
|
Zhengliang Shi
|
Chengrui Huang
|
Quan Tu
|
Shuo Shang
|
Zhiliang Tian
|
Minlie Huang
Findings of the Association for Computational Linguistics: ACL 2024
Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance. In this paper, we propose Reusable Experience Accumulation with 360∘ Assessment (360∘REA), a hierarchical multi-agent framework inspired by corporate organizational practices. The framework employs a novel 360∘ performance assessment method for multi-perspective performance evaluation with fine-grained assessment. To enhance the capability of agents in addressing complex tasks, we introduce dual-level experience pool for agents to accumulate experience through fine-grained assessment. Extensive experiments on complex task datasets demonstrate the effectiveness of 360∘REA.
pdf
bib
abs
Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering
Zhengliang Shi
|
Shuo Zhang
|
Weiwei Sun
|
Shen Gao
|
Pengjie Ren
|
Zhumin Chen
|
Zhaochun Ren
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-Hop Question Answering (MHQA) task presents a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair into retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method. To further facilitate future research, we have collected a dataset that traces the reasoning process.
2023
pdf
bib
abs
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
Zhengliang Shi
|
Weiwei Sun
|
Shuo Zhang
|
Zhen Zhang
|
Pengjie Ren
|
Zhaochun Ren
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores. Moreover, an auxiliary response generation task enhances prediction via a shared encoder. To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation. Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.
pdf
bib
abs
Towards a Unified Framework for Reference Retrieval and Related Work Generation
Zhengliang Shi
|
Shen Gao
|
Zhen Zhang
|
Xiuying Chen
|
Zhumin Chen
|
Pengjie Ren
|
Zhaochun Ren
Findings of the Association for Computational Linguistics: EMNLP 2023
The task of related work generation aims to generate a comprehensive survey of related research topics automatically, saving time and effort for authors. Existing methods simplify this task by using human-annotated references in a large-scale scientific corpus as information sources, which is time- and cost-intensive. To this end, we propose a Unified Reference Retrieval and Related Work Generation Model (UR3WG), which combines reference retrieval and related work generation processes in a unified framework based on the large language model (LLM). Specifically, UR3WG first leverages the world knowledge of LLM to extend the abstract and generate the query for the subsequent retrieval stage. Then a lexicon-enhanced dense retrieval is proposed to search relevant references, where an importance-aware representation of the lexicon is introduced. We also propose multi-granularity contrastive learning to optimize our retriever. Since this task is not simply summarizing the main points in references, it should analyze the complex relationships and present them logically. We propose an instruction-tuning method to leverage LLM to generate related work. Extensive experiments on two wide-applied datasets demonstrate that our model outperforms the state-of-the-art baselines in both generation and retrieval metrics.