Crowdsourcing Veridicality Annotations in Spanish: Can Speakers Actually Agree?

Teresa Martín Soeder


Abstract
In veridicality studies, an area of research of Natural Language Inference (NLI), the factuality of different contexts is evaluated. This task, known to be a difficult one since often it is not clear what the interpretation should be Uma et al. (2021), is key for building any Natural Language Understanding (NLU) system that aims at making the right inferences. Here the results of a study that analyzes the veridicality of mood alternation and specificity in Spanish, and whose labels are based on those of Saurí and Pustejovsky (2009) are presented. It has an inter-annotator agreement of AC2 = 0.114, considerably lower than that of de Marneffe et al. (2012) (κ = 0.53), a main reference to this work; and a couple of mood-related significant effects. Due to this strong lack of agreement, an analysis of what factors cause disagreement is presented together with a discussion based on the work of de Marneffe et al. (2012) and Pavlick and Kwiatkowski (2019) about the quality of the annotations gathered and whether other types of analysis like entropy distribution could better represent this corpus. The annotations collected are available at https://github.com/narhim/veridicality_spanish.
Anthology ID:
2023.ranlp-stud.8
Volume:
Proceedings of the 8th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Momchil Hardalov, Zara Kancheva, Boris Velichkov, Ivelina Nikolova-Koleva, Milena Slavcheva
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
68–77
Language:
URL:
https://aclanthology.org/2023.ranlp-stud.8
DOI:
Bibkey:
Cite (ACL):
Teresa Martín Soeder. 2023. Crowdsourcing Veridicality Annotations in Spanish: Can Speakers Actually Agree?. In Proceedings of the 8th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing, pages 68–77, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Crowdsourcing Veridicality Annotations in Spanish: Can Speakers Actually Agree? (Martín Soeder, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-stud.8.pdf