It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance

Arjun Subramonian, Xingdi Yuan, Hal Daumé III, Su Lin Blodgett


Abstract
Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance are operationalized. To provide evidence for our taxonomy, we conduct a meta-analysis of relevant literature to understand how NLP tasks are conceptualized, as well as a survey of practitioners about their impressions of different factors that affect benchmark validity. Our meta-analysis and survey across eight tasks, ranging from coreference resolution to question answering, uncover that tasks are generally not clearly and consistently conceptualized and benchmarks suffer from operationalization disagreements. These findings support our proposed taxonomy of disagreement. Finally, based on our taxonomy, we present a framework for constructing benchmarks and documenting their limitations.
Anthology ID:
2023.findings-acl.202
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3234–3279
Language:
URL:
https://aclanthology.org/2023.findings-acl.202
DOI:
10.18653/v1/2023.findings-acl.202
Bibkey:
Cite (ACL):
Arjun Subramonian, Xingdi Yuan, Hal Daumé III, and Su Lin Blodgett. 2023. It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3234–3279, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance (Subramonian et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.202.pdf
Video:
 https://aclanthology.org/2023.findings-acl.202.mp4