Robust Dictionary Lookup in Multiple Noisy Orthographies

Lingliang Zhang; Nizar Habash; Godfried Toussaint

doi:10.18653/v1/W17-1315

Robust Dictionary Lookup in Multiple Noisy Orthographies

Lingliang Zhang, Nizar Habash, Godfried Toussaint

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean” feature, as well as the Yamli smart Arabic keyboard.

Anthology ID:: W17-1315
Volume:: Proceedings of the Third Arabic Natural Language Processing Workshop
Month:: April
Year:: 2017
Address:: Valencia, Spain
Editors:: Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:: WANLP
SIGs:: SEMITIC | SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 119–129
Language:
URL:: https://aclanthology.org/W17-1315/
DOI:: 10.18653/v1/W17-1315
Bibkey:
Cite (ACL):: Lingliang Zhang, Nizar Habash, and Godfried Toussaint. 2017. Robust Dictionary Lookup in Multiple Noisy Orthographies. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 119–129, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):: Robust Dictionary Lookup in Multiple Noisy Orthographies (Zhang et al., WANLP 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-1315.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{zhang-etal-2017-robust,
    title = "Robust Dictionary Lookup in Multiple Noisy Orthographies",
    author = "Zhang, Lingliang  and
      Habash, Nizar  and
      Toussaint, Godfried",
    editor = "Habash, Nizar  and
      Diab, Mona  and
      Darwish, Kareem  and
      El-Hajj, Wassim  and
      Al-Khalifa, Hend  and
      Bouamor, Houda  and
      Tomeh, Nadi  and
      El-Haj, Mahmoud  and
      Zaghouani, Wajdi",
    booktitle = "Proceedings of the Third {A}rabic Natural Language Processing Workshop",
    month = apr,
    year = "2017",
    address = "Valencia, Spain",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-1315/",
    doi = "10.18653/v1/W17-1315",
    pages = "119--129",
    abstract = "We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate{'}s ``did you mean'' feature, as well as the Yamli smart Arabic keyboard."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="zhang-etal-2017-robust">
    <titleInfo>
        <title>Robust Dictionary Lookup in Multiple Noisy Orthographies</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Lingliang</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Nizar</namePart>
        <namePart type="family">Habash</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Godfried</namePart>
        <namePart type="family">Toussaint</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-04</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Arabic Natural Language Processing Workshop</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Nizar</namePart>
            <namePart type="family">Habash</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mona</namePart>
            <namePart type="family">Diab</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kareem</namePart>
            <namePart type="family">Darwish</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Wassim</namePart>
            <namePart type="family">El-Hajj</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Hend</namePart>
            <namePart type="family">Al-Khalifa</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Houda</namePart>
            <namePart type="family">Bouamor</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nadi</namePart>
            <namePart type="family">Tomeh</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mahmoud</namePart>
            <namePart type="family">El-Haj</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Wajdi</namePart>
            <namePart type="family">Zaghouani</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Valencia, Spain</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean” feature, as well as the Yamli smart Arabic keyboard.</abstract>
    <identifier type="citekey">zhang-etal-2017-robust</identifier>
    <identifier type="doi">10.18653/v1/W17-1315</identifier>
    <location>
        <url>https://aclanthology.org/W17-1315/</url>
    </location>
    <part>
        <date>2017-04</date>
        <extent unit="page">
            <start>119</start>
            <end>129</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Robust Dictionary Lookup in Multiple Noisy Orthographies
%A Zhang, Lingliang
%A Habash, Nizar
%A Toussaint, Godfried
%Y Habash, Nizar
%Y Diab, Mona
%Y Darwish, Kareem
%Y El-Hajj, Wassim
%Y Al-Khalifa, Hend
%Y Bouamor, Houda
%Y Tomeh, Nadi
%Y El-Haj, Mahmoud
%Y Zaghouani, Wajdi
%S Proceedings of the Third Arabic Natural Language Processing Workshop
%D 2017
%8 April
%I Association for Computational Linguistics
%C Valencia, Spain
%F zhang-etal-2017-robust
%X We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean” feature, as well as the Yamli smart Arabic keyboard.
%R 10.18653/v1/W17-1315
%U https://aclanthology.org/W17-1315/
%U https://doi.org/10.18653/v1/W17-1315
%P 119-129

Download as File

Markdown (Informal)

[Robust Dictionary Lookup in Multiple Noisy Orthographies](https://aclanthology.org/W17-1315/) (Zhang et al., WANLP 2017)

Robust Dictionary Lookup in Multiple Noisy Orthographies (Zhang et al., WANLP 2017)

ACL

Lingliang Zhang, Nizar Habash, and Godfried Toussaint. 2017. Robust Dictionary Lookup in Multiple Noisy Orthographies. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 119–129, Valencia, Spain. Association for Computational Linguistics.