HausaMT v1.0: Towards English–Hausa Neural Machine Translation

Adewale Akinfaderin

doi:10.18653/v1/2020.winlp-1.38

HausaMT v1.0: Towards English–Hausa Neural Machine Translation

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English–Hausa machine translation, which is considered a task for low–resource language. The Hausa language is the second largest Afro–Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa–English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder–decoder architecture with two tokenization approaches: standard word–level tokenization and Byte Pair Encoding (BPE) subword tokenization.

Anthology ID:: 2020.winlp-1.38
Volume:: Proceedings of the Fourth Widening Natural Language Processing Workshop
Month:: July
Year:: 2020
Address:: Seattle, USA
Editors:: Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
Venue:: WiNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 144–147
Language:
URL:: https://aclanthology.org/2020.winlp-1.38/
DOI:: 10.18653/v1/2020.winlp-1.38
Bibkey:
Cite (ACL):: Adewale Akinfaderin. 2020. HausaMT v1.0: Towards English–Hausa Neural Machine Translation. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 144–147, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):: HausaMT v1.0: Towards English–Hausa Neural Machine Translation (Akinfaderin, WiNLP 2020)
Copy Citation:
Video:: http://slideslive.com/38929578

Cite Search Video Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{akinfaderin-2020-hausamt,
    title = "{H}ausa{MT} v1.0: Towards {E}nglish{--}{H}ausa Neural Machine Translation",
    author = "Akinfaderin, Adewale",
    editor = "Cunha, Rossana  and
      Shaikh, Samira  and
      Varis, Erika  and
      Georgi, Ryan  and
      Tsai, Alicia  and
      Anastasopoulos, Antonios  and
      Chandu, Khyathi Raghavi",
    booktitle = "Proceedings of the Fourth Widening Natural Language Processing Workshop",
    month = jul,
    year = "2020",
    address = "Seattle, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.winlp-1.38/",
    doi = "10.18653/v1/2020.winlp-1.38",
    pages = "144--147",
    abstract = "Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English{--}Hausa machine translation, which is considered a task for low{--}resource language. The Hausa language is the second largest Afro{--}Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa{--}English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder{--}decoder architecture with two tokenization approaches: standard word{--}level tokenization and Byte Pair Encoding (BPE) subword tokenization."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="akinfaderin-2020-hausamt">
    <titleInfo>
        <title>HausaMT v1.0: Towards English–Hausa Neural Machine Translation</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Adewale</namePart>
        <namePart type="family">Akinfaderin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Fourth Widening Natural Language Processing Workshop</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Rossana</namePart>
            <namePart type="family">Cunha</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Samira</namePart>
            <namePart type="family">Shaikh</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Erika</namePart>
            <namePart type="family">Varis</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ryan</namePart>
            <namePart type="family">Georgi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alicia</namePart>
            <namePart type="family">Tsai</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Antonios</namePart>
            <namePart type="family">Anastasopoulos</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Khyathi</namePart>
            <namePart type="given">Raghavi</namePart>
            <namePart type="family">Chandu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Seattle, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English–Hausa machine translation, which is considered a task for low–resource language. The Hausa language is the second largest Afro–Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa–English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder–decoder architecture with two tokenization approaches: standard word–level tokenization and Byte Pair Encoding (BPE) subword tokenization.</abstract>
    <identifier type="citekey">akinfaderin-2020-hausamt</identifier>
    <identifier type="doi">10.18653/v1/2020.winlp-1.38</identifier>
    <location>
        <url>https://aclanthology.org/2020.winlp-1.38/</url>
    </location>
    <part>
        <date>2020-07</date>
        <extent unit="page">
            <start>144</start>
            <end>147</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T HausaMT v1.0: Towards English–Hausa Neural Machine Translation
%A Akinfaderin, Adewale
%Y Cunha, Rossana
%Y Shaikh, Samira
%Y Varis, Erika
%Y Georgi, Ryan
%Y Tsai, Alicia
%Y Anastasopoulos, Antonios
%Y Chandu, Khyathi Raghavi
%S Proceedings of the Fourth Widening Natural Language Processing Workshop
%D 2020
%8 July
%I Association for Computational Linguistics
%C Seattle, USA
%F akinfaderin-2020-hausamt
%X Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English–Hausa machine translation, which is considered a task for low–resource language. The Hausa language is the second largest Afro–Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa–English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder–decoder architecture with two tokenization approaches: standard word–level tokenization and Byte Pair Encoding (BPE) subword tokenization.
%R 10.18653/v1/2020.winlp-1.38
%U https://aclanthology.org/2020.winlp-1.38/
%U https://doi.org/10.18653/v1/2020.winlp-1.38
%P 144-147

Download as File

Markdown (Informal)

[HausaMT v1.0: Towards English–Hausa Neural Machine Translation](https://aclanthology.org/2020.winlp-1.38/) (Akinfaderin, WiNLP 2020)

HausaMT v1.0: Towards English–Hausa Neural Machine Translation (Akinfaderin, WiNLP 2020)

ACL

Adewale Akinfaderin. 2020. HausaMT v1.0: Towards English–Hausa Neural Machine Translation. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 144–147, Seattle, USA. Association for Computational Linguistics.