A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models

Iwona Christop; Mateusz Czyżnikiewicz; Paweł Skórzewski; Łukasz Bondaruk; Jakub Kubiak; Marcin Lewandowski; Marek Kubis

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models

Iwona Christop, Mateusz Czyżnikiewicz, Paweł Skórzewski, Łukasz Bondaruk, Jakub Kubiak, Marcin Lewandowski, Marek Kubis

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

The present benchmarks for testing the audio modality of multimodal large language models concentrate on testing various audio tasks such as speaker diarization or gender identification in isolation. Whether a multimodal model can answer the questions that require reasoning skills to combine audio tasks of different categories cannot be verified with their use. To address this issue, we propose Audio Reasoning Tasks (ART), a new benchmark for assessing the ability of multimodal models to solve problems that require reasoning over audio signal.

Anthology ID:: 2026.eacl-long.42
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 953–983
Language:
URL:: https://aclanthology.org/2026.eacl-long.42/
DOI:
Bibkey:
Cite (ACL):: Iwona Christop, Mateusz Czyżnikiewicz, Paweł Skórzewski, Łukasz Bondaruk, Jakub Kubiak, Marcin Lewandowski, and Marek Kubis. 2026. A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 953–983, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models (Christop et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.42.pdf
Checklist:: 2026.eacl-long.42.checklist.pdf

PDF Cite Search Checklist Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{christop-etal-2026-benchmark,
    title = "A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models",
    author = "Christop, Iwona  and
      Czy{\.z}nikiewicz, Mateusz  and
      Sk{\'o}rzewski, Pawe{\l}  and
      Bondaruk, {\L}ukasz  and
      Kubiak, Jakub  and
      Lewandowski, Marcin  and
      Kubis, Marek",
    editor = "Demberg, Vera  and
      Inui, Kentaro  and
      Marquez, Llu{\'i}s",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-long.42/",
    pages = "953--983",
    ISBN = "979-8-89176-380-7",
    abstract = "The present benchmarks for testing the audio modality of multimodal large language models concentrate on testing various audio tasks such as speaker diarization or gender identification in isolation. Whether a multimodal model can answer the questions that require reasoning skills to combine audio tasks of different categories cannot be verified with their use. To address this issue, we propose Audio Reasoning Tasks (ART), a new benchmark for assessing the ability of multimodal models to solve problems that require reasoning over audio signal."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="christop-etal-2026-benchmark">
    <titleInfo>
        <title>A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Iwona</namePart>
        <namePart type="family">Christop</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mateusz</namePart>
        <namePart type="family">Czyżnikiewicz</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Paweł</namePart>
        <namePart type="family">Skórzewski</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Łukasz</namePart>
        <namePart type="family">Bondaruk</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jakub</namePart>
        <namePart type="family">Kubiak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Marcin</namePart>
        <namePart type="family">Lewandowski</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Marek</namePart>
        <namePart type="family">Kubis</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2026-03</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Vera</namePart>
            <namePart type="family">Demberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kentaro</namePart>
            <namePart type="family">Inui</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lluís</namePart>
            <namePart type="family">Marquez</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Rabat, Morocco</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-380-7</identifier>
    </relatedItem>
    <abstract>The present benchmarks for testing the audio modality of multimodal large language models concentrate on testing various audio tasks such as speaker diarization or gender identification in isolation. Whether a multimodal model can answer the questions that require reasoning skills to combine audio tasks of different categories cannot be verified with their use. To address this issue, we propose Audio Reasoning Tasks (ART), a new benchmark for assessing the ability of multimodal models to solve problems that require reasoning over audio signal.</abstract>
    <identifier type="citekey">christop-etal-2026-benchmark</identifier>
    <location>
        <url>https://aclanthology.org/2026.eacl-long.42/</url>
    </location>
    <part>
        <date>2026-03</date>
        <extent unit="page">
            <start>953</start>
            <end>983</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models
%A Christop, Iwona
%A Czyżnikiewicz, Mateusz
%A Skórzewski, Paweł
%A Bondaruk, Łukasz
%A Kubiak, Jakub
%A Lewandowski, Marcin
%A Kubis, Marek
%Y Demberg, Vera
%Y Inui, Kentaro
%Y Marquez, Lluís
%S Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
%D 2026
%8 March
%I Association for Computational Linguistics
%C Rabat, Morocco
%@ 979-8-89176-380-7
%F christop-etal-2026-benchmark
%X The present benchmarks for testing the audio modality of multimodal large language models concentrate on testing various audio tasks such as speaker diarization or gender identification in isolation. Whether a multimodal model can answer the questions that require reasoning skills to combine audio tasks of different categories cannot be verified with their use. To address this issue, we propose Audio Reasoning Tasks (ART), a new benchmark for assessing the ability of multimodal models to solve problems that require reasoning over audio signal.
%U https://aclanthology.org/2026.eacl-long.42/
%P 953-983

Download as File

Markdown (Informal)

[A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models](https://aclanthology.org/2026.eacl-long.42/) (Christop et al., EACL 2026)

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models (Christop et al., EACL 2026)

ACL

Iwona Christop, Mateusz Czyżnikiewicz, Paweł Skórzewski, Łukasz Bondaruk, Jakub Kubiak, Marcin Lewandowski, and Marek Kubis. 2026. A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 953–983, Rabat, Morocco. Association for Computational Linguistics.