Byte-Level Massively Multilingual Semantic Parsing

Massimo Nicosia; Francesco Piccinno

doi:10.18653/v1/2022.mmnlu-1.3

Byte-Level Massively Multilingual Semantic Parsing

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Token free approaches have been successfully applied to a series of word and span level tasks. In this work, we evaluate a byte-level sequence to sequence model (ByT5) on the 51 languages in the MASSIVE multilingual semantic parsing dataset. We examine multiple experimental settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic data. By leveraging a state-of-the-art label projection method for machine translated examples, we are able to reduce the gap in exact match to only 5 points with respect to a model trained on gold data from all the languages. We additionally provide insights on the cross-lingual transfer of ByT5 and show how the model compares with respect to mT5 across all parameter sizes.

Anthology ID:: 2022.mmnlu-1.3
Volume:: Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Jack FitzGerald, Kay Rottmann, Julia Hirschberg, Mohit Bansal, Anna Rumshisky, Charith Peris, Christopher Hench
Venue:: MMNLU
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25–34
Language:
URL:: https://aclanthology.org/2022.mmnlu-1.3/
DOI:: 10.18653/v1/2022.mmnlu-1.3
Bibkey:
Cite (ACL):: Massimo Nicosia and Francesco Piccinno. 2022. Byte-Level Massively Multilingual Semantic Parsing. In Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22), pages 25–34, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Byte-Level Massively Multilingual Semantic Parsing (Nicosia & Piccinno, MMNLU 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.mmnlu-1.3.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{nicosia-piccinno-2022-byte,
    title = "Byte-Level Massively Multilingual Semantic Parsing",
    author = "Nicosia, Massimo  and
      Piccinno, Francesco",
    editor = "FitzGerald, Jack  and
      Rottmann, Kay  and
      Hirschberg, Julia  and
      Bansal, Mohit  and
      Rumshisky, Anna  and
      Peris, Charith  and
      Hench, Christopher",
    booktitle = "Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.mmnlu-1.3/",
    doi = "10.18653/v1/2022.mmnlu-1.3",
    pages = "25--34",
    abstract = "Token free approaches have been successfully applied to a series of word and span level tasks. In this work, we evaluate a byte-level sequence to sequence model (ByT5) on the 51 languages in the MASSIVE multilingual semantic parsing dataset. We examine multiple experimental settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic data. By leveraging a state-of-the-art label projection method for machine translated examples, we are able to reduce the gap in exact match to only 5 points with respect to a model trained on gold data from all the languages. We additionally provide insights on the cross-lingual transfer of ByT5 and show how the model compares with respect to mT5 across all parameter sizes."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="nicosia-piccinno-2022-byte">
    <titleInfo>
        <title>Byte-Level Massively Multilingual Semantic Parsing</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Massimo</namePart>
        <namePart type="family">Nicosia</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Francesco</namePart>
        <namePart type="family">Piccinno</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Jack</namePart>
            <namePart type="family">FitzGerald</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kay</namePart>
            <namePart type="family">Rottmann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Julia</namePart>
            <namePart type="family">Hirschberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mohit</namePart>
            <namePart type="family">Bansal</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Rumshisky</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Charith</namePart>
            <namePart type="family">Peris</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christopher</namePart>
            <namePart type="family">Hench</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Abu Dhabi, United Arab Emirates (Hybrid)</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Token free approaches have been successfully applied to a series of word and span level tasks. In this work, we evaluate a byte-level sequence to sequence model (ByT5) on the 51 languages in the MASSIVE multilingual semantic parsing dataset. We examine multiple experimental settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic data. By leveraging a state-of-the-art label projection method for machine translated examples, we are able to reduce the gap in exact match to only 5 points with respect to a model trained on gold data from all the languages. We additionally provide insights on the cross-lingual transfer of ByT5 and show how the model compares with respect to mT5 across all parameter sizes.</abstract>
    <identifier type="citekey">nicosia-piccinno-2022-byte</identifier>
    <identifier type="doi">10.18653/v1/2022.mmnlu-1.3</identifier>
    <location>
        <url>https://aclanthology.org/2022.mmnlu-1.3/</url>
    </location>
    <part>
        <date>2022-12</date>
        <extent unit="page">
            <start>25</start>
            <end>34</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Byte-Level Massively Multilingual Semantic Parsing
%A Nicosia, Massimo
%A Piccinno, Francesco
%Y FitzGerald, Jack
%Y Rottmann, Kay
%Y Hirschberg, Julia
%Y Bansal, Mohit
%Y Rumshisky, Anna
%Y Peris, Charith
%Y Hench, Christopher
%S Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)
%D 2022
%8 December
%I Association for Computational Linguistics
%C Abu Dhabi, United Arab Emirates (Hybrid)
%F nicosia-piccinno-2022-byte
%X Token free approaches have been successfully applied to a series of word and span level tasks. In this work, we evaluate a byte-level sequence to sequence model (ByT5) on the 51 languages in the MASSIVE multilingual semantic parsing dataset. We examine multiple experimental settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic data. By leveraging a state-of-the-art label projection method for machine translated examples, we are able to reduce the gap in exact match to only 5 points with respect to a model trained on gold data from all the languages. We additionally provide insights on the cross-lingual transfer of ByT5 and show how the model compares with respect to mT5 across all parameter sizes.
%R 10.18653/v1/2022.mmnlu-1.3
%U https://aclanthology.org/2022.mmnlu-1.3/
%U https://doi.org/10.18653/v1/2022.mmnlu-1.3
%P 25-34

Download as File

Markdown (Informal)

[Byte-Level Massively Multilingual Semantic Parsing](https://aclanthology.org/2022.mmnlu-1.3/) (Nicosia & Piccinno, MMNLU 2022)

Byte-Level Massively Multilingual Semantic Parsing (Nicosia & Piccinno, MMNLU 2022)

ACL

Massimo Nicosia and Francesco Piccinno. 2022. Byte-Level Massively Multilingual Semantic Parsing. In Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22), pages 25–34, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.