Log-linear Models for Uyghur Segmentation in Spoken Language Translation

Chenggang Mi; Yating Yang; Rui Dong; Xi Zhou; Lei Wang; Xiao Li; Tonghai Jiang

doi:10.26615/978-954-452-049-6_065

Log-linear Models for Uyghur Segmentation in Spoken Language Translation

Chenggang Mi, Yating Yang, Rui Dong, Xi Zhou, Lei Wang, Xiao Li, Tonghai Jiang

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.

Anthology ID:: R17-1065
Volume:: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:: September
Year:: 2017
Address:: Varna, Bulgaria
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 492–500
Language:
URL:: https://doi.org/10.26615/978-954-452-049-6_065
DOI:: 10.26615/978-954-452-049-6_065
Bibkey:
Cite (ACL):: Chenggang Mi, Yating Yang, Rui Dong, Xi Zhou, Lei Wang, Xiao Li, and Tonghai Jiang. 2017. Log-linear Models for Uyghur Segmentation in Spoken Language Translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 492–500, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):: Log-linear Models for Uyghur Segmentation in Spoken Language Translation (Mi et al., RANLP 2017)
Copy Citation:
PDF:: https://doi.org/10.26615/978-954-452-049-6_065

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{mi-etal-2017-log,
    title = "Log-linear Models for {U}yghur Segmentation in Spoken Language Translation",
    author = "Mi, Chenggang  and
      Yang, Yating  and
      Dong, Rui  and
      Zhou, Xi  and
      Wang, Lei  and
      Li, Xiao  and
      Jiang, Tonghai",
    editor = "Mitkov, Ruslan  and
      Angelova, Galia",
    booktitle = "Proceedings of the International Conference Recent Advances in Natural Language Processing, {RANLP} 2017",
    month = sep,
    year = "2017",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd.",
    url = "https://aclanthology.org/R17-1065/",
    doi = "10.26615/978-954-452-049-6_065",
    pages = "492--500",
    abstract = "To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="mi-etal-2017-log">
    <titleInfo>
        <title>Log-linear Models for Uyghur Segmentation in Spoken Language Translation</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Chenggang</namePart>
        <namePart type="family">Mi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yating</namePart>
        <namePart type="family">Yang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Rui</namePart>
        <namePart type="family">Dong</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xi</namePart>
        <namePart type="family">Zhou</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lei</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xiao</namePart>
        <namePart type="family">Li</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tonghai</namePart>
        <namePart type="family">Jiang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-09</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Ruslan</namePart>
            <namePart type="family">Mitkov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Galia</namePart>
            <namePart type="family">Angelova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>INCOMA Ltd.</publisher>
            <place>
                <placeTerm type="text">Varna, Bulgaria</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.</abstract>
    <identifier type="citekey">mi-etal-2017-log</identifier>
    <identifier type="doi">10.26615/978-954-452-049-6_065</identifier>
    <location>
        <url>https://aclanthology.org/R17-1065/</url>
    </location>
    <part>
        <date>2017-09</date>
        <extent unit="page">
            <start>492</start>
            <end>500</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Log-linear Models for Uyghur Segmentation in Spoken Language Translation
%A Mi, Chenggang
%A Yang, Yating
%A Dong, Rui
%A Zhou, Xi
%A Wang, Lei
%A Li, Xiao
%A Jiang, Tonghai
%Y Mitkov, Ruslan
%Y Angelova, Galia
%S Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
%D 2017
%8 September
%I INCOMA Ltd.
%C Varna, Bulgaria
%F mi-etal-2017-log
%X To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.
%R 10.26615/978-954-452-049-6_065
%U https://aclanthology.org/R17-1065/
%U https://doi.org/10.26615/978-954-452-049-6_065
%P 492-500

Download as File

Markdown (Informal)

[Log-linear Models for Uyghur Segmentation in Spoken Language Translation](https://aclanthology.org/R17-1065/) (Mi et al., RANLP 2017)

Log-linear Models for Uyghur Segmentation in Spoken Language Translation (Mi et al., RANLP 2017)

ACL

Chenggang Mi, Yating Yang, Rui Dong, Xi Zhou, Lei Wang, Xiao Li, and Tonghai Jiang. 2017. Log-linear Models for Uyghur Segmentation in Spoken Language Translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 492–500, Varna, Bulgaria. INCOMA Ltd..