Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data

Kosuke Doi; Katsuhito Sudoh; Satoshi Nakamura

doi:10.18653/v1/2021.iwslt-1.27

Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data

Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.

Anthology ID:: 2021.iwslt-1.27
Volume:: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Month:: August
Year:: 2021
Address:: Bangkok, Thailand (online)
Editors:: Marcello Federico, Alex Waibel, Marta R. Costa-jussà, Jan Niehues, Sebastian Stuker, Elizabeth Salesky
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 226–235
Language:
URL:: https://aclanthology.org/2021.iwslt-1.27/
DOI:: 10.18653/v1/2021.iwslt-1.27
Bibkey:
Cite (ACL):: Kosuke Doi, Katsuhito Sudoh, and Satoshi Nakamura. 2021. Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pages 226–235, Bangkok, Thailand (online). Association for Computational Linguistics.
Cite (Informal):: Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data (Doi et al., IWSLT 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.iwslt-1.27.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{doi-etal-2021-large,
    title = "Large-Scale {E}nglish-{J}apanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data",
    author = "Doi, Kosuke  and
      Sudoh, Katsuhito  and
      Nakamura, Satoshi",
    editor = "Federico, Marcello  and
      Waibel, Alex  and
      Costa-juss{\`a}, Marta R.  and
      Niehues, Jan  and
      Stuker, Sebastian  and
      Salesky, Elizabeth",
    booktitle = "Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)",
    month = aug,
    year = "2021",
    address = "Bangkok, Thailand (online)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.iwslt-1.27/",
    doi = "10.18653/v1/2021.iwslt-1.27",
    pages = "226--235",
    abstract = "This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="doi-etal-2021-large">
    <titleInfo>
        <title>Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Kosuke</namePart>
        <namePart type="family">Doi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Katsuhito</namePart>
        <namePart type="family">Sudoh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Satoshi</namePart>
        <namePart type="family">Nakamura</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2021-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Marcello</namePart>
            <namePart type="family">Federico</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alex</namePart>
            <namePart type="family">Waibel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marta</namePart>
            <namePart type="given">R</namePart>
            <namePart type="family">Costa-jussà</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jan</namePart>
            <namePart type="family">Niehues</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sebastian</namePart>
            <namePart type="family">Stuker</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Elizabeth</namePart>
            <namePart type="family">Salesky</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Bangkok, Thailand (online)</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.</abstract>
    <identifier type="citekey">doi-etal-2021-large</identifier>
    <identifier type="doi">10.18653/v1/2021.iwslt-1.27</identifier>
    <location>
        <url>https://aclanthology.org/2021.iwslt-1.27/</url>
    </location>
    <part>
        <date>2021-08</date>
        <extent unit="page">
            <start>226</start>
            <end>235</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data
%A Doi, Kosuke
%A Sudoh, Katsuhito
%A Nakamura, Satoshi
%Y Federico, Marcello
%Y Waibel, Alex
%Y Costa-jussà, Marta R.
%Y Niehues, Jan
%Y Stuker, Sebastian
%Y Salesky, Elizabeth
%S Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
%D 2021
%8 August
%I Association for Computational Linguistics
%C Bangkok, Thailand (online)
%F doi-etal-2021-large
%X This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.
%R 10.18653/v1/2021.iwslt-1.27
%U https://aclanthology.org/2021.iwslt-1.27/
%U https://doi.org/10.18653/v1/2021.iwslt-1.27
%P 226-235

Download as File

Markdown (Informal)

[Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data](https://aclanthology.org/2021.iwslt-1.27/) (Doi et al., IWSLT 2021)

Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data (Doi et al., IWSLT 2021)

ACL

Kosuke Doi, Katsuhito Sudoh, and Satoshi Nakamura. 2021. Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pages 226–235, Bangkok, Thailand (online). Association for Computational Linguistics.