From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items

Melissa Roemmele, Andrew Gordon


Abstract
LLMs can now perform a variety of complex writing tasks. They also excel in answering questions pertaining to natural language inference and commonsense reasoning. Composing these questions is itself a skilled writing task, so in this paper we consider LLMs as authors of commonsense assessment items. We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning, the Choice of Plausible Alternatives (COPA). We examine the outcome according to analyses facilitated by the LLMs and human annotation. We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.
Anthology ID:
2024.findings-emnlp.299
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5193–5203
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.299/
DOI:
10.18653/v1/2024.findings-emnlp.299
Bibkey:
Cite (ACL):
Melissa Roemmele and Andrew Gordon. 2024. From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 5193–5203, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items (Roemmele & Gordon, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.299.pdf