Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis

Sachin Sasidharan Nair, Tanvi Dinkar, Gavin Abercrombie


Abstract
Growing awareness of a ‘Reproducibility Crisis’ in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance.
Anthology ID:
2024.humeval-1.11
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
114–124
Language:
URL:
https://aclanthology.org/2024.humeval-1.11
DOI:
Bibkey:
Cite (ACL):
Sachin Sasidharan Nair, Tanvi Dinkar, and Gavin Abercrombie. 2024. Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 114–124, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis (Sasidharan Nair et al., HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.11.pdf