Joint Generation of Captions and Subtitles with Dual Decoding

Jitao Xu; François Buet; Josep M. Crego; Elise Bertin-Lemée; François Yvon

doi:10.18653/v1/2022.iwslt-1.7

Joint Generation of Captions and Subtitles with Dual Decoding

Jitao Xu, François Buet, Josep Crego, Elise Bertin-Lemée, François Yvon

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity.

Anthology ID:: 2022.iwslt-1.7
Volume:: Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
Month:: May
Year:: 2022
Address:: Dublin, Ireland (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Marta Costa-jussà
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 74–82
Language:
URL:: https://aclanthology.org/2022.iwslt-1.7/
DOI:: 10.18653/v1/2022.iwslt-1.7
Bibkey:
Cite (ACL):: Jitao Xu, François Buet, Josep Crego, Elise Bertin-Lemée, and François Yvon. 2022. Joint Generation of Captions and Subtitles with Dual Decoding. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 74–82, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
Cite (Informal):: Joint Generation of Captions and Subtitles with Dual Decoding (Xu et al., IWSLT 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.iwslt-1.7.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{xu-etal-2022-joint,
    title = "Joint Generation of Captions and Subtitles with Dual Decoding",
    author = "Xu, Jitao  and
      Buet, Fran{\c{c}}ois  and
      Crego, Josep  and
      Bertin-Lem{\'e}e, Elise  and
      Yvon, Fran{\c{c}}ois",
    editor = "Salesky, Elizabeth  and
      Federico, Marcello  and
      Costa-juss{\`a}, Marta",
    booktitle = "Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland (in-person and online)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.iwslt-1.7/",
    doi = "10.18653/v1/2022.iwslt-1.7",
    pages = "74--82",
    abstract = "As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="xu-etal-2022-joint">
    <titleInfo>
        <title>Joint Generation of Captions and Subtitles with Dual Decoding</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Jitao</namePart>
        <namePart type="family">Xu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">François</namePart>
        <namePart type="family">Buet</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Josep</namePart>
        <namePart type="family">Crego</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Elise</namePart>
        <namePart type="family">Bertin-Lemée</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">François</namePart>
        <namePart type="family">Yvon</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Elizabeth</namePart>
            <namePart type="family">Salesky</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marcello</namePart>
            <namePart type="family">Federico</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marta</namePart>
            <namePart type="family">Costa-jussà</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Dublin, Ireland (in-person and online)</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity.</abstract>
    <identifier type="citekey">xu-etal-2022-joint</identifier>
    <identifier type="doi">10.18653/v1/2022.iwslt-1.7</identifier>
    <location>
        <url>https://aclanthology.org/2022.iwslt-1.7/</url>
    </location>
    <part>
        <date>2022-05</date>
        <extent unit="page">
            <start>74</start>
            <end>82</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Joint Generation of Captions and Subtitles with Dual Decoding
%A Xu, Jitao
%A Buet, François
%A Crego, Josep
%A Bertin-Lemée, Elise
%A Yvon, François
%Y Salesky, Elizabeth
%Y Federico, Marcello
%Y Costa-jussà, Marta
%S Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
%D 2022
%8 May
%I Association for Computational Linguistics
%C Dublin, Ireland (in-person and online)
%F xu-etal-2022-joint
%X As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity.
%R 10.18653/v1/2022.iwslt-1.7
%U https://aclanthology.org/2022.iwslt-1.7/
%U https://doi.org/10.18653/v1/2022.iwslt-1.7
%P 74-82

Download as File

Markdown (Informal)

[Joint Generation of Captions and Subtitles with Dual Decoding](https://aclanthology.org/2022.iwslt-1.7/) (Xu et al., IWSLT 2022)

Joint Generation of Captions and Subtitles with Dual Decoding (Xu et al., IWSLT 2022)

ACL

Jitao Xu, François Buet, Josep Crego, Elise Bertin-Lemée, and François Yvon. 2022. Joint Generation of Captions and Subtitles with Dual Decoding. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 74–82, Dublin, Ireland (in-person and online). Association for Computational Linguistics.