Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions

Mikhail Krasitskii; Olga Kolesnikova; Liliana Chanona Hernandez; Grigori Sidorov; Alexander Gelbukh

doi:10.18653/v1/2025.nlp4dh-1.27

Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions

Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, Alexander Gelbukh

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This study examines sentiment analysis in Tamil-English code-mixed texts using advanced transformer-based architectures. The unique linguistic challenges, including mixed grammar, orthographic variability, and phonetic inconsistencies, are addressed. Data limitations and annotation gaps are discussed, highlighting the need for larger datasets. The performance of models such as XLM-RoBERTa, mT5, IndicBERT, and RemBERT is evaluated, with insights into their optimization for low-resource, code-mixed environments.

Anthology ID:: 2025.nlp4dh-1.27
Volume:: Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:: May
Year:: 2025
Address:: Albuquerque, USA
Editors:: Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:: NLP4DH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 305–312
Language:
URL:: https://aclanthology.org/2025.nlp4dh-1.27/
DOI:: 10.18653/v1/2025.nlp4dh-1.27
Bibkey:
Cite (ACL):: Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, and Alexander Gelbukh. 2025. Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 305–312, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions (Krasitskii et al., NLP4DH 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.nlp4dh-1.27.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{krasitskii-etal-2025-advancing,
    title = "Advancing Sentiment Analysis in {T}amil-{E}nglish Code-Mixed Texts: Challenges and Transformer-Based Solutions",
    author = "Krasitskii, Mikhail  and
      Kolesnikova, Olga  and
      Chanona Hernandez, Liliana  and
      Sidorov, Grigori  and
      Gelbukh, Alexander",
    editor = {H{\"a}m{\"a}l{\"a}inen, Mika  and
      {\"O}hman, Emily  and
      Bizzoni, Yuri  and
      Miyagawa, So  and
      Alnajjar, Khalid},
    booktitle = "Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities",
    month = may,
    year = "2025",
    address = "Albuquerque, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.nlp4dh-1.27/",
    doi = "10.18653/v1/2025.nlp4dh-1.27",
    pages = "305--312",
    ISBN = "979-8-89176-234-3",
    abstract = "This study examines sentiment analysis in Tamil-English code-mixed texts using advanced transformer-based architectures. The unique linguistic challenges, including mixed grammar, orthographic variability, and phonetic inconsistencies, are addressed. Data limitations and annotation gaps are discussed, highlighting the need for larger datasets. The performance of models such as XLM-RoBERTa, mT5, IndicBERT, and RemBERT is evaluated, with insights into their optimization for low-resource, code-mixed environments."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="krasitskii-etal-2025-advancing">
    <titleInfo>
        <title>Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Mikhail</namePart>
        <namePart type="family">Krasitskii</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Olga</namePart>
        <namePart type="family">Kolesnikova</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Liliana</namePart>
        <namePart type="family">Chanona Hernandez</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Grigori</namePart>
        <namePart type="family">Sidorov</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alexander</namePart>
        <namePart type="family">Gelbukh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Mika</namePart>
            <namePart type="family">Hämäläinen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Emily</namePart>
            <namePart type="family">Öhman</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yuri</namePart>
            <namePart type="family">Bizzoni</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">So</namePart>
            <namePart type="family">Miyagawa</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Khalid</namePart>
            <namePart type="family">Alnajjar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Albuquerque, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-234-3</identifier>
    </relatedItem>
    <abstract>This study examines sentiment analysis in Tamil-English code-mixed texts using advanced transformer-based architectures. The unique linguistic challenges, including mixed grammar, orthographic variability, and phonetic inconsistencies, are addressed. Data limitations and annotation gaps are discussed, highlighting the need for larger datasets. The performance of models such as XLM-RoBERTa, mT5, IndicBERT, and RemBERT is evaluated, with insights into their optimization for low-resource, code-mixed environments.</abstract>
    <identifier type="citekey">krasitskii-etal-2025-advancing</identifier>
    <identifier type="doi">10.18653/v1/2025.nlp4dh-1.27</identifier>
    <location>
        <url>https://aclanthology.org/2025.nlp4dh-1.27/</url>
    </location>
    <part>
        <date>2025-05</date>
        <extent unit="page">
            <start>305</start>
            <end>312</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions
%A Krasitskii, Mikhail
%A Kolesnikova, Olga
%A Chanona Hernandez, Liliana
%A Sidorov, Grigori
%A Gelbukh, Alexander
%Y Hämäläinen, Mika
%Y Öhman, Emily
%Y Bizzoni, Yuri
%Y Miyagawa, So
%Y Alnajjar, Khalid
%S Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
%D 2025
%8 May
%I Association for Computational Linguistics
%C Albuquerque, USA
%@ 979-8-89176-234-3
%F krasitskii-etal-2025-advancing
%X This study examines sentiment analysis in Tamil-English code-mixed texts using advanced transformer-based architectures. The unique linguistic challenges, including mixed grammar, orthographic variability, and phonetic inconsistencies, are addressed. Data limitations and annotation gaps are discussed, highlighting the need for larger datasets. The performance of models such as XLM-RoBERTa, mT5, IndicBERT, and RemBERT is evaluated, with insights into their optimization for low-resource, code-mixed environments.
%R 10.18653/v1/2025.nlp4dh-1.27
%U https://aclanthology.org/2025.nlp4dh-1.27/
%U https://doi.org/10.18653/v1/2025.nlp4dh-1.27
%P 305-312

Download as File

Markdown (Informal)

[Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions](https://aclanthology.org/2025.nlp4dh-1.27/) (Krasitskii et al., NLP4DH 2025)

Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions (Krasitskii et al., NLP4DH 2025)

ACL

Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, and Alexander Gelbukh. 2025. Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 305–312, Albuquerque, USA. Association for Computational Linguistics.