2024
pdf
bib
abs
LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay
Yihuai Lan
|
Zhiqiang Hu
|
Lei Wang
|
Yang Wang
|
Deheng Ye
|
Peilin Zhao
|
Ee-Peng Lim
|
Hui Xiong
|
Hao Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
This paper explores the open research problem of understanding the social behaviors of LLM-based agents. Using Avalon as a testbed, we employ system prompts to guide LLM agents in gameplay. While previous studies have touched on gameplay with LLM agents, research on their social behaviors is lacking. We propose a novel framework, tailored for Avalon, features a multi-agent system facilitating efficient communication and interaction. We evaluate its performance based on game success and analyze LLM agents’ social behaviors. Results affirm the framework’s effectiveness in creating adaptive agents and suggest LLM-based agents’ potential in navigating dynamic social interactions. By examining collaboration and confrontation behaviors, we offer insights into this field’s research and applications.
pdf
bib
abs
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Wenhao Shi
|
Zhiqiang Hu
|
Yi Bin
|
Junhua Liu
|
Yang Yang
|
See-Kiong Ng
|
Lidong Bing
|
Roy Ka-Wei Lee
Findings of the Association for Computational Linguistics: EMNLP 2024
Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions. We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista’s minitest split, and yielding leading performance on Math-V and MathVerse. Furthermore, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. Our research highlights the importance of dataset diversity and synthesis in advancing MLLMs’ mathematical reasoning abilities. The code and data are available at:
https://github.com/HZQ950419/Math-LLaVA.
pdf
bib
abs
SeaLLMs - Large Language Models for Southeast Asia
Xuan-Phi Nguyen
|
Wenxuan Zhang
|
Xin Li
|
Mahani Aljunied
|
Zhiqiang Hu
|
Chenhui Shen
|
Yew Ken Chia
|
Xingxuan Li
|
Jianyu Wang
|
Qingyu Tan
|
Liying Cheng
|
Guanzheng Chen
|
Yue Deng
|
Sen Yang
|
Chaoqun Liu
|
Hang Zhang
|
Lidong Bing
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are built upon popular English-centric models through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning to better capture the intricacies of regional languages. This allows them to respect and reflect local cultural norms, customs, stylistic preferences, and legal considerations. Our comprehensive evaluation demonstrates that SeaLLM models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform ChatGPT-3.5 in non-Latin languages, such as Thai, Khmer, Lao, and Burmese, by large margins while remaining lightweight and cost-effective to operate.
2023
pdf
bib
abs
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Lei Wang
|
Wanyu Xu
|
Yihuai Lan
|
Zhiqiang Hu
|
Yunshi Lan
|
Roy Ka-Wei Lee
|
Ee-Peng Lim
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, Few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual efforts, Zero-shot-CoT concatenates the target problem statement with “
Let’s think step by step” as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at
https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.
pdf
bib
abs
Adapter-TST: A Parameter Efficient Method for Multiple-Attribute Text Style Transfer
Zhiqiang Hu
|
Nancy Chen
|
Roy Lee
Findings of the Association for Computational Linguistics: EMNLP 2023
Adapting a large language model for multiple-attribute text style transfer via fine-tuning can be challenging due to the substantial amount of computational resources and labeled data required for the specific downstream task. In this paper, we address this challenge by introducing Adapter-TST, a framework that freezes the pre-trained model’s original parameters and enables the development of a multiple-attribute text style transfer model. Using BART as the backbone model, Adapter-TST utilizes different neural adapters to model different types of attribute information, similar to a plug-in connected to BART. Our method allows control over multiple attributes (e.g. sentiment, tense, active or passive voice) and configures the adapters’ architecture to generate multiple outputs in respect to attributes or compositional editing on the same sentence. We evaluate the proposed model on both traditional sentiment transfer and multiple-attribute transfer tasks. The experiment results demonstrate that Adapter-TST outperforms all the state-of-the-art baselines with significantly less computational resources. We have also empirically shown that each adapter is able to characterize specific stylistic attributes effectively and can be configured to perform compositional editing.
pdf
bib
abs
Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification
Chia-Yu Hung
|
Zhiqiang Hu
|
Yujia Hu
|
Roy Lee
Findings of the Association for Computational Linguistics: EMNLP 2023
Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task.
pdf
bib
abs
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
Zhiqiang Hu
|
Lei Wang
|
Yihuai Lan
|
Wanyu Xu
|
Ee-Peng Lim
|
Lidong Bing
|
Xing Xu
|
Soujanya Poria
|
Roy Lee
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on simple math reasoning datasets.
2021
pdf
bib
abs
Syntax Matters! Syntax-Controlled in Text Style Transfer
Zhiqiang Hu
|
Roy Ka-Wei Lee
|
Charu C. Aggarwal
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Existing text style transfer (TST) methods rely on style classifiers to disentangle the text’s content and style attributes for text style transfer. While the style classifier plays a critical role in existing TST methods, there is no known investigation on its effect on the TST methods. In this paper, we conduct an empirical study on the limitations of the style classifiers used in existing TST methods. We demonstrated that the existing style classifiers cannot learn sentence syntax effectively and ultimately worsen existing TST models’ performance. To address this issue, we propose a novel Syntax-Aware Controllable Generation (SACG) model, which includes a syntax-aware style classifier that ensures learned style latent representations effectively capture the sentence structure for TST. Through extensive experiments on two popular text style transfer tasks, we show that our proposed method significantly outperforms twelve state-of-the-art methods. Our case studies have also demonstrated SACG’s ability to generate fluent target-style sentences that preserved the original content.
pdf
bib
abs
Improving Text Auto-Completion with Next Phrase Prediction
Dong-Ho Lee
|
Zhiqiang Hu
|
Roy Ka-Wei Lee
Findings of the Association for Computational Linguistics: EMNLP 2021
Language models such as GPT-2 have performed well on constructing syntactically sound sentences for text auto-completion tasks. However, such models often require considerable training effort to adapt to specific writing domains (e.g., medical). In this paper, we propose an intermediate training strategy to enhance pre-trained language models’ performance in the text auto-completion task and fastly adapt them to specific domains. Our strategy includes a novel self-supervised training objective called Next Phrase Prediction (NPP), which encourages a language model to complete the partial query with enriched phrases and eventually improve the model’s text auto-completion performance. Preliminary experiments have shown that our approach is able to outperform the baselines in auto-completion for email and academic-writing domains.