2024
pdf
bib
abs
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
Zehui Chen
|
Kuikun Liu
|
Qiuchen Wang
|
Wenwei Zhang
|
Jiangning Liu
|
Dahua Lin
|
Kai Chen
|
Feng Zhao
Findings of the Association for Computational Linguistics: ACL 2024
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem.This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. Based on the above findings, we propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.Through careful decomposition and redesign of the training corpus, Agent-FLAN enables Llama2-7B to outperform prior best works by 3.5% across various agent evaluation datasets. With comprehensively constructed negative samples, Agent-FLAN greatly alleviates the hallucination issues based on our established evaluation benchmark. Besides, it consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing the general capability of LLMs. The code and models are available at https://github.com/InternLM/Agent-FLAN.
pdf
bib
abs
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
Zehui Chen
|
Weihua Du
|
Wenwei Zhang
|
Kuikun Liu
|
Jiangning Liu
|
Miao Zheng
|
Jingming Zhuo
|
Songyang Zhang
|
Dahua Lin
|
Kai Chen
|
Feng Zhao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have achieved remarkable performance on various NLP tasks and are augmented by tools for broader applications. Yet, how to evaluate and analyze the tool utilization capability of LLMs is still under-explored. In contrast to previous works that evaluate models holistically, we comprehensively decompose the tool utilization into multiple sub-processes, including instruction following, planning, reasoning, retrieval, understanding, and review. Based on that, we further introduce T-Eval to evaluate the tool-utilization capability step by step. T-Eval disentangles the tool utilization evaluation into several sub-domains along model capabilities, facilitating the inner understanding of both holistic and isolated competency of LLMs. We conduct extensive experiments on T-Eval and in-depth analysis of various LLMs. T-Eval not only exhibits consistency with the outcome-oriented evaluation but also provides a more fine-grained analysis of the capabilities of LLMs, providing a new perspective in LLM evaluation on tool-utilization ability. The benchmark will be available.
pdf
bib
abs
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
Zaibin Zhang
|
Yongting Zhang
|
Lijun Li
|
Jing Shao
|
Hongzhi Gao
|
Yu Qiao
|
Lijun Wang
|
Huchuan Lu
|
Feng Zhao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety.To tackle these concerns, we propose a comprehensive framework (PsySafe) grounded in agent psychology, focusing on three key areas: firstly, identifying how dark personality traits in agents can lead to risky behaviors; secondly, evaluating the safety of multi-agent systems from the psychological and behavioral perspectives, and thirdly, devising effective strategies to mitigate these risks.Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents’ self-reflection when engaging in dangerous behavior, and the correlation between agents’ psychological assessments and dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We make our data and code publicly accessible at https://github.com/AI4Good24/PsySafe.
pdf
bib
abs
Correcting Language Model Bias for Text Classification in True Zero-Shot Learning
Feng Zhao
|
Wan Xianlin
|
Cheng Yan
|
Chu Kiong Loo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Combining pre-trained language models (PLMs) and manual templates is a common practice for text classification in zero-shot scenarios. However, the effect of this approach is highly volatile, ranging from random guesses to near state-of-the-art results, depending on the quality of the manual templates. In this paper, we show that this instability stems from the fact that language models tend toward predicting certain label words of text classification, and manual templates can influence this tendency. To address this, we develop a novel pipeline for annotating and filtering a few examples from unlabeled examples. Moreover, we propose a new method to measure model bias on label words that utilizes unlabeled examples as a validation set when tuning language models. Our approach does not require any pre-labeled examples. Experimental results on six text classification tasks demonstrate that the proposed approach significantly outperforms standard prompt learning in zero-shot settings, achieving up to 19.7% absolute improvement and 13.8% average improvement. More surprisingly, on IMDB and SST-2, our approach even exceeds all few-shot baselines.
2023
pdf
bib
abs
Structure-aware Knowledge Graph-to-text Generation with Planning Selection and Similarity Distinction
Feng Zhao
|
Hongzhi Zou
|
Cheng Yan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
The knowledge graph-to-text (KG-to-text) generation task aims to synthesize coherent and engaging sentences that accurately convey the complex information derived from an input knowledge graph. One of the primary challenges in this task is bridging the gap between the diverse structures of the KG and the target text, while preserving the details of the input KG. To address this, we propose a novel approach that efficiently integrates graph structure-aware modules with pre-trained language models. Unlike conventional techniques, which only consider direct connections between first-order neighbors, our method delves deeper by incorporating Relative Distance Encoding as a bias within the graph structure-aware module. This enables our model to better capture the intricate topology information present in the KG. To further elevate the fidelity of the generated text, Planning Selection and Similarity Distinction are introduced. Our approach filters the most relevant linearized sequences by employing a planning scorer, while simultaneously distinguishing similar input KGs through contrastive learning techniques. Experiments on two datasets demonstrate the superiority of our model.
2022
pdf
bib
abs
RelCLIP: Adapting Language-Image Pretraining for Visual Relationship Detection via Relational Contrastive Learning
Yi Zhu
|
Zhaoqing Zhu
|
Bingqian Lin
|
Xiaodan Liang
|
Feng Zhao
|
Jianzhuang Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Conventional visual relationship detection models only use the numeric ids of relation labels for training, but ignore the semantic correlation between the labels, which leads to severe training biases and harms the generalization ability of representations. In this paper, we introduce compact language information of relation labels for regularizing the representation learning of visual relations. Specifically, we propose a simple yet effective visual Relationship prediction framework that transfers natural language knowledge learned from Contrastive Language-Image Pre-training (CLIP) models to enhance the relationship prediction, termed RelCLIP. Benefiting from the powerful visual-semantic alignment ability of CLIP at image level, we introduce a novel Relational Contrastive Learning (RCL) approach which explores relation-level visual-semantic alignment via learning to match cross-modal relational embeddings. By collaboratively learning the semantic coherence and discrepancy from relation triplets, the model can generate more discriminative and robust representations. Experimental results on the Visual Genome dataset show that RelCLIP achieves significant improvements over strong baselines under full (provide accurate labels) and distant supervision (provide noise labels), demonstrating its powerful generalization ability in learning relationship representations. Code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/RelCLIP.
pdf
bib
abs
Can Language Models Serve as Temporal Knowledge Bases?
Ruilin Zhao
|
Feng Zhao
|
Guandong Xu
|
Sixiao Zhang
|
Hai Jin
Findings of the Association for Computational Linguistics: EMNLP 2022
Recent progress regarding the use of language models (LMs) as knowledge bases (KBs) has shown that language models can act as structured knowledge bases for storing relational facts. However, most existing works only considered the LM-as-KB paradigm in a static setting, which ignores the analysis of temporal dynamics of world knowledge. Furthermore, a basic function of KBs, i.e., the ability to store conflicting information (i.e., 1-N, N-1, and N-M relations), is underexplored. In this paper, we formulate two practical requirements for treating LMs as temporal KBs: (i) The capacity to store temporally-scoped knowledge that contains conflicting information and (ii) the ability to use stored knowledge for temporally-scoped knowledge queries. We introduce a new dataset called LAMA-TK which is aimed at probing temporally-scoped knowledge, and investigate the two above requirements to explore the LM-as-KB paradigm in the temporal domain. On the one hand, experiments show that LMs can memorize millions of temporally-scoped facts with relatively high accuracy and transfer stored knowledge to temporal knowledge queries, thereby expanding the LM-as-KB paradigm to the temporal domain. On the other hand, we show that memorizing conflicting information, which has been neglected by previous works, is still challenging for LMs and hinders the memorization of other unrelated one-to-one relationships.
pdf
bib
abs
OpticE: A Coherence Theory-Based Model for Link Prediction
Xiangyu Gui
|
Feng Zhao
|
Langjunqing Jin
|
Hai Jin
Proceedings of the 29th International Conference on Computational Linguistics
Knowledge representation learning is a key step required for link prediction tasks with knowledge graphs (KGs). During the learning process, the semantics of each entity are embedded by a vector or a point in a feature space. The distance between these points is a measure of semantic similarity. However, in a KG, while two entities may have similar semantics in some relations, they have different semantics in others. It is ambiguous to assign a fixed distance to depict the variant semantic similarity of entities. To alleviate the semantic ambiguity in KGs, we design a new embedding approach named OpticE, which is derived from the well-known physical phenomenon of optical interference. It is a lightweight and relation-adaptive model based on coherence theory, in which each entity’s semantics vary automatically regarding different relations. In addition, a unique negative sampling method is proposed to combine the multimapping properties and self-adversarial learning during the training process. The experimental results obtained on practical KG benchmarks show that the OpticE model, with elegant structures, can compete with existing link prediction methods.