DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition

Nhi Nguyen Yen Truong; Sang Le Quang; Huy Tran Quang; Tri Pham Xuan; Duong Tran Ham; Binh Tran Le Hai; Tin Huynh; Hoang Kiem

DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition

Nhi Nguyen Yen Truong, Sang Le Quang, Huy Tran Quang, Tri Pham Xuan, Duong Tran Ham, Binh Tran Le Hai, Tin Huynh, Kiem Hoang

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Anthology ID:: 2025.vlsp-1.6
Volume:: Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing
Month:: October
Year:: 2025
Address:: Hanoi, Vietnam
Editors:: Luong Chi Mai, Nguyen Thi Minh Huyen, Nguyen Thi Thu Trang
Venues:: VLSP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36–44
Language:
URL:: https://aclanthology.org/2025.vlsp-1.6/
DOI:
Bibkey:
Cite (ACL):: Nhi Nguyen Yen Truong, Sang Le Quang, Huy Tran Quang, Tri Pham Xuan, Duong Tran Ham, Binh Tran Le Hai, Tin Huynh, and Kiem Hoang. 2025. DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition. In Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing, pages 36–44, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):: DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition (Truong et al., VLSP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.vlsp-1.6.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{truong-etal-2025-dfat,
    title = "{DFAT}: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition",
    author = "Truong, Nhi Nguyen Yen  and
      Quang, Sang Le  and
      Quang, Huy Tran  and
      Xuan, Tri Pham  and
      Ham, Duong Tran  and
      Hai, Binh Tran Le  and
      Huynh, Tin  and
      Hoang, Kiem",
    editor = "Mai, Luong Chi  and
      Huyen, Nguyen Thi Minh  and
      Trang, Nguyen Thi Thu",
    booktitle = "Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing",
    month = oct,
    year = "2025",
    address = "Hanoi, Vietnam",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.vlsp-1.6/",
    pages = "36--44"
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="truong-etal-2025-dfat">
    <titleInfo>
        <title>DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Nhi</namePart>
        <namePart type="given">Nguyen</namePart>
        <namePart type="given">Yen</namePart>
        <namePart type="family">Truong</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sang</namePart>
        <namePart type="given">Le</namePart>
        <namePart type="family">Quang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Huy</namePart>
        <namePart type="given">Tran</namePart>
        <namePart type="family">Quang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tri</namePart>
        <namePart type="given">Pham</namePart>
        <namePart type="family">Xuan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Duong</namePart>
        <namePart type="given">Tran</namePart>
        <namePart type="family">Ham</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Binh</namePart>
        <namePart type="given">Tran</namePart>
        <namePart type="given">Le</namePart>
        <namePart type="family">Hai</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tin</namePart>
        <namePart type="family">Huynh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kiem</namePart>
        <namePart type="family">Hoang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-10</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Luong</namePart>
            <namePart type="given">Chi</namePart>
            <namePart type="family">Mai</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nguyen</namePart>
            <namePart type="given">Thi</namePart>
            <namePart type="given">Minh</namePart>
            <namePart type="family">Huyen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nguyen</namePart>
            <namePart type="given">Thi</namePart>
            <namePart type="given">Thu</namePart>
            <namePart type="family">Trang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Hanoi, Vietnam</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <identifier type="citekey">truong-etal-2025-dfat</identifier>
    <location>
        <url>https://aclanthology.org/2025.vlsp-1.6/</url>
    </location>
    <part>
        <date>2025-10</date>
        <extent unit="page">
            <start>36</start>
            <end>44</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition
%A Truong, Nhi Nguyen Yen
%A Quang, Sang Le
%A Quang, Huy Tran
%A Xuan, Tri Pham
%A Ham, Duong Tran
%A Hai, Binh Tran Le
%A Huynh, Tin
%A Hoang, Kiem
%Y Mai, Luong Chi
%Y Huyen, Nguyen Thi Minh
%Y Trang, Nguyen Thi Thu
%S Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing
%D 2025
%8 October
%I Association for Computational Linguistics
%C Hanoi, Vietnam
%F truong-etal-2025-dfat
%U https://aclanthology.org/2025.vlsp-1.6/
%P 36-44

Download as File

Markdown (Informal)

[DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition](https://aclanthology.org/2025.vlsp-1.6/) (Truong et al., VLSP 2025)

DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition (Truong et al., VLSP 2025)

ACL

Nhi Nguyen Yen Truong, Sang Le Quang, Huy Tran Quang, Tri Pham Xuan, Duong Tran Ham, Binh Tran Le Hai, Tin Huynh, and Kiem Hoang. 2025. DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition. In Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing, pages 36–44, Hanoi, Vietnam. Association for Computational Linguistics.