PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities

Settaluri Sravanthi, Meet Doshi, Pavan Tankala, Rudra Murthy, Raj Dabre, Pushpak Bhattacharyya


Abstract
LLMs have demonstrated remarkable capability for understanding semantics, but their understanding of pragmatics is not well studied. To this end, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely; Implicature, Presupposition, Reference, and Deixis. We curate high-quality test sets for each task, consisting of Multiple Choice Question Answers (MCQA). PUB includes a total of 28k data points, 6.1k are newly annotated. We evaluate nine models varying in the number of parameters and type of training. Our study reveals several key observations about the pragmatic capabilities of LLMs: 1. chat-fine-tuning strongly benefits smaller models, 2. large base models are competitive with their chat-fine-tuned counterparts, 3. there is a huge variance in performance across different pragmatics phenomena, and 4. a noticeable performance gap between human capabilities and model capabilities. We hope that PUB will enable comprehensive evaluation of LLM’s pragmatic reasoning capabilities.
Anthology ID:
2024.findings-acl.719
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12075–12097
Language:
URL:
https://aclanthology.org/2024.findings-acl.719
DOI:
Bibkey:
Cite (ACL):
Settaluri Sravanthi, Meet Doshi, Pavan Tankala, Rudra Murthy, Raj Dabre, and Pushpak Bhattacharyya. 2024. PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities. In Findings of the Association for Computational Linguistics ACL 2024, pages 12075–12097, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities (Sravanthi et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.719.pdf