Distilling Translations with Visual Awareness

Julia Ive; Pranava Swaroop Madhyastha; Lucia Specia

doi:10.18653/v1/P19-1653

Distilling Translations with Visual Awareness

Julia Ive, Pranava Madhyastha, Lucia Specia

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient. As a consequence, models tend to learn to ignore this information. We propose a translate-and-refine approach to this problem where images are only used by a second stage decoder. This approach is trained jointly to generate a good first draft translation and to improve over this draft by (i) making better use of the target language textual context (both left and right-side contexts) and (ii) making use of visual context. This approach leads to the state of the art results. Additionally, we show that it has the ability to recover from erroneous or missing words in the source language.

Anthology ID:: P19-1653
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6525–6538
Language:
URL:: https://aclanthology.org/P19-1653/
DOI:: 10.18653/v1/P19-1653
Bibkey:
Cite (ACL):: Julia Ive, Pranava Madhyastha, and Lucia Specia. 2019. Distilling Translations with Visual Awareness. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6525–6538, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Distilling Translations with Visual Awareness (Ive et al., ACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/P19-1653.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ive-etal-2019-distilling,
    title = "Distilling Translations with Visual Awareness",
    author = "Ive, Julia  and
      Madhyastha, Pranava  and
      Specia, Lucia",
    editor = "Korhonen, Anna  and
      Traum, David  and
      M{\`a}rquez, Llu{\'i}s",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1653/",
    doi = "10.18653/v1/P19-1653",
    pages = "6525--6538",
    abstract = "Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient. As a consequence, models tend to learn to ignore this information. We propose a translate-and-refine approach to this problem where images are only used by a second stage decoder. This approach is trained jointly to generate a good first draft translation and to improve over this draft by (i) making better use of the target language textual context (both left and right-side contexts) and (ii) making use of visual context. This approach leads to the state of the art results. Additionally, we show that it has the ability to recover from erroneous or missing words in the source language."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ive-etal-2019-distilling">
    <titleInfo>
        <title>Distilling Translations with Visual Awareness</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Julia</namePart>
        <namePart type="family">Ive</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Pranava</namePart>
        <namePart type="family">Madhyastha</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lucia</namePart>
        <namePart type="family">Specia</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Korhonen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Traum</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lluís</namePart>
            <namePart type="family">Màrquez</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Florence, Italy</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient. As a consequence, models tend to learn to ignore this information. We propose a translate-and-refine approach to this problem where images are only used by a second stage decoder. This approach is trained jointly to generate a good first draft translation and to improve over this draft by (i) making better use of the target language textual context (both left and right-side contexts) and (ii) making use of visual context. This approach leads to the state of the art results. Additionally, we show that it has the ability to recover from erroneous or missing words in the source language.</abstract>
    <identifier type="citekey">ive-etal-2019-distilling</identifier>
    <identifier type="doi">10.18653/v1/P19-1653</identifier>
    <location>
        <url>https://aclanthology.org/P19-1653/</url>
    </location>
    <part>
        <date>2019-07</date>
        <extent unit="page">
            <start>6525</start>
            <end>6538</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Distilling Translations with Visual Awareness
%A Ive, Julia
%A Madhyastha, Pranava
%A Specia, Lucia
%Y Korhonen, Anna
%Y Traum, David
%Y Màrquez, Lluís
%S Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
%D 2019
%8 July
%I Association for Computational Linguistics
%C Florence, Italy
%F ive-etal-2019-distilling
%X Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient. As a consequence, models tend to learn to ignore this information. We propose a translate-and-refine approach to this problem where images are only used by a second stage decoder. This approach is trained jointly to generate a good first draft translation and to improve over this draft by (i) making better use of the target language textual context (both left and right-side contexts) and (ii) making use of visual context. This approach leads to the state of the art results. Additionally, we show that it has the ability to recover from erroneous or missing words in the source language.
%R 10.18653/v1/P19-1653
%U https://aclanthology.org/P19-1653/
%U https://doi.org/10.18653/v1/P19-1653
%P 6525-6538

Download as File

Markdown (Informal)

[Distilling Translations with Visual Awareness](https://aclanthology.org/P19-1653/) (Ive et al., ACL 2019)

Distilling Translations with Visual Awareness (Ive et al., ACL 2019)

ACL

Julia Ive, Pranava Madhyastha, and Lucia Specia. 2019. Distilling Translations with Visual Awareness. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6525–6538, Florence, Italy. Association for Computational Linguistics.