Rethinking Pragmatics in Large Language Models: Towards Open-Ended Evaluation and Preference Tuning

Shengguang Wu, Shusheng Yang, Zhenglun Chen, Qi Su


Abstract
This study addresses the challenges of assessing and enhancing social-pragmatic inference in large language models (LLMs). We first highlight the inadequacy of current accuracy-based multiple choice question answering (MCQA) formats in assessing social-pragmatic reasoning, and propose the direct evaluation of models’ free-form responses as measure, which correlates better with human judgment. Furthermore, we explore methods to improve pragmatic abilities in LLMs, advocating for preference optimization (PO) over supervised finetuning (SFT), given the absence of a definitive “gold” answer in social contexts. Our results show that preferential tuning consistently outperforms SFT across pragmatic phenomena and offers a near-free launch in pragmatic abilities without compromising general capabilities. Lastly, we examine the internal structure of LLMs, revealing that the significant boost in pragmatic reasoning is tied to deeper layer representations, analogous to human high-level thinking. Our experiments span a variety of pragmatic and social reasoning datasets, as well as an image referential game requiring a multimodal theory of mind (ToM). With our refined paradigms for evaluating and enhancing pragmatic inference, this paper offers key insights into building more socially aware language models.
Anthology ID:
2024.emnlp-main.1258
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22583–22599
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1258
DOI:
Bibkey:
Cite (ACL):
Shengguang Wu, Shusheng Yang, Zhenglun Chen, and Qi Su. 2024. Rethinking Pragmatics in Large Language Models: Towards Open-Ended Evaluation and Preference Tuning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 22583–22599, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Rethinking Pragmatics in Large Language Models: Towards Open-Ended Evaluation and Preference Tuning (Wu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1258.pdf