Yanan Chen


2024

pdf bib
Athena: Safe Autonomous Agents with Verbal Contrastive Learning
Tanmana Sadhu | Ali Pesaranghader | Yanan Chen | Dong Hoon Yi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.

2022

pdf bib
Rethinking Data Augmentation in Text-to-text Paradigm
Yanan Chen | Yang Liu
Proceedings of the 29th International Conference on Computational Linguistics

As manually labelling data can be costly, some recent studies tend to augment the training data for improving the generalization power of machine learning models, known as data augmentation (DA). With the arise of pre-trained language models (PLMs), some recent works on DA try to synthesize new samples benefiting from the knowledge learned from PLM’s pre-training. Along the same direction, we in this paper propose to integrate text-to-text language models and construct a new two-phase framework for augmentation: 1) a fine-tuning phase where PLMs are well adapted to downstream classification with the help of two novel schemes, and 2) a generation phase where the fine-tuned models are leveraged to create new samples for performance lifting. This paradigm opens up a new way of designing fine-tuning scheme to better serve DA in an easy-to-implement manner, and can be easily extended to other desired tasks. We evaluate our proposal on two public classification datasets and demonstrate its effectiveness with remarkable gains.