Common Law Annotations: Investigating the Stability of Dialog System Output Annotations

Seunggun Lee, Alexandra DeLucia, Nikita Nangia, Praneeth Ganedi, Ryan Guan, Rubing Li, Britney Ngaw, Aditya Singhal, Shalaka Vaidya, Zijun Yuan, Lining Zhang, João Sedoc


Abstract
Metrics for Inter-Annotator Agreement (IAA), like Cohen’s Kappa, are crucial for validating annotated datasets. Although high agreement is often used to show the reliability of annotation procedures, it is insufficient to ensure or reproducibility. While researchers are encouraged to increase annotator agreement, this can lead to specific and tailored annotation guidelines. We hypothesize that this may result in diverging annotations from different groups. To study this, we first propose the Lee et al. Protocol (LEAP), a standardized and codified annotation protocol. LEAP strictly enforces transparency in the annotation process, which ensures reproducibility of annotation guidelines. Using LEAP to annotate a dialog dataset, we empirically show that while research groups may create reliable guidelines by raising agreement, this can cause divergent annotations across different research groups, thus questioning the validity of the annotations. Therefore, we caution NLP researchers against using reliability as a proxy for reproducibility and validity.
Anthology ID:
2023.findings-acl.780
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12315–12349
Language:
URL:
https://aclanthology.org/2023.findings-acl.780
DOI:
10.18653/v1/2023.findings-acl.780
Bibkey:
Cite (ACL):
Seunggun Lee, Alexandra DeLucia, Nikita Nangia, Praneeth Ganedi, Ryan Guan, Rubing Li, Britney Ngaw, Aditya Singhal, Shalaka Vaidya, Zijun Yuan, Lining Zhang, and João Sedoc. 2023. Common Law Annotations: Investigating the Stability of Dialog System Output Annotations. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12315–12349, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Common Law Annotations: Investigating the Stability of Dialog System Output Annotations (Lee et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.780.pdf