Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis

Sachin Sasidharan Nair; Tanvi Dinkar; Gavin Abercrombie

Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis

Sachin Sasidharan Nair, Tanvi Dinkar, Gavin Abercrombie

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Growing awareness of a ‘Reproducibility Crisis’ in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance.

Anthology ID:: 2024.humeval-1.11
Volume:: Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:: HumEval | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 114–124
Language:
URL:: https://aclanthology.org/2024.humeval-1.11/
DOI:
Bibkey:
Cite (ACL):: Sachin Sasidharan Nair, Tanvi Dinkar, and Gavin Abercrombie. 2024. Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 114–124, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis (Sasidharan Nair et al., HumEval 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.humeval-1.11.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{sasidharan-nair-etal-2024-exploring,
    title = "Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis",
    author = "Sasidharan Nair, Sachin  and
      Dinkar, Tanvi  and
      Abercrombie, Gavin",
    editor = "Balloccu, Simone  and
      Belz, Anya  and
      Huidrom, Rudali  and
      Reiter, Ehud  and
      Sedoc, Joao  and
      Thomson, Craig",
    booktitle = "Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.humeval-1.11/",
    pages = "114--124",
    abstract = "Growing awareness of a `Reproducibility Crisis' in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="sasidharan-nair-etal-2024-exploring">
    <titleInfo>
        <title>Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Sachin</namePart>
        <namePart type="family">Sasidharan Nair</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tanvi</namePart>
        <namePart type="family">Dinkar</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Gavin</namePart>
        <namePart type="family">Abercrombie</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2024-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Simone</namePart>
            <namePart type="family">Balloccu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anya</namePart>
            <namePart type="family">Belz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rudali</namePart>
            <namePart type="family">Huidrom</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ehud</namePart>
            <namePart type="family">Reiter</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Joao</namePart>
            <namePart type="family">Sedoc</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Craig</namePart>
            <namePart type="family">Thomson</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>ELRA and ICCL</publisher>
            <place>
                <placeTerm type="text">Torino, Italia</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Growing awareness of a ‘Reproducibility Crisis’ in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance.</abstract>
    <identifier type="citekey">sasidharan-nair-etal-2024-exploring</identifier>
    <location>
        <url>https://aclanthology.org/2024.humeval-1.11/</url>
    </location>
    <part>
        <date>2024-05</date>
        <extent unit="page">
            <start>114</start>
            <end>124</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis
%A Sasidharan Nair, Sachin
%A Dinkar, Tanvi
%A Abercrombie, Gavin
%Y Balloccu, Simone
%Y Belz, Anya
%Y Huidrom, Rudali
%Y Reiter, Ehud
%Y Sedoc, Joao
%Y Thomson, Craig
%S Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
%D 2024
%8 May
%I ELRA and ICCL
%C Torino, Italia
%F sasidharan-nair-etal-2024-exploring
%X Growing awareness of a ‘Reproducibility Crisis’ in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance.
%U https://aclanthology.org/2024.humeval-1.11/
%P 114-124

Download as File

Markdown (Informal)

[Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis](https://aclanthology.org/2024.humeval-1.11/) (Sasidharan Nair et al., HumEval 2024)

Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis (Sasidharan Nair et al., HumEval 2024)

ACL

Sachin Sasidharan Nair, Tanvi Dinkar, and Gavin Abercrombie. 2024. Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 114–124, Torino, Italia. ELRA and ICCL.