Malayalam Speech Corpus: Design and Development for Dravidian Language

Lekshmi K R; Jithesh V S; Elizabeth Sherly

Malayalam Speech Corpus: Design and Development for Dravidian Language

Lekshmi K R, Jithesh V S, Elizabeth Sherly

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

To overpass the disparity between theory and applications in language-related technology in the text as well as speech and several other areas, a well-designed and well-developed corpus is essential. Several problems and issues encountered while developing a corpus, especially for low resource languages. The Malayalam Speech Corpus (MSC) is one of the first open speech corpora for Automatic Speech Recognition (ASR) research to the best of our knowledge. It consists of 250 hours of Agricultural speech data. We are providing a transcription file, lexicon and annotated speech along with the audio segment. It is available in future for public use upon request at “www.iiitmk.ac.in/vrclc/utilities/ml_speechcorpus”. This paper details the development and collection process in the domain of agricultural speech corpora in the Malayalam Language.

Anthology ID:: 2020.wildre-1.5
Volume:: Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Girish Nath Jha, Kalika Bali, Sobha L., S. S. Agrawal, Atul Kr. Ojha
Venue:: WILDRE
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 25–28
Language:: English
URL:: https://aclanthology.org/2020.wildre-1.5/
DOI:
Bibkey:
Cite (ACL):: Lekshmi K R, Jithesh V S, and Elizabeth Sherly. 2020. Malayalam Speech Corpus: Design and Development for Dravidian Language. In Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pages 25–28, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):: Malayalam Speech Corpus: Design and Development for Dravidian Language (K R et al., WILDRE 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.wildre-1.5.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{k-r-etal-2020-malayalam,
    title = "{M}alayalam Speech Corpus: Design and Development for {D}ravidian Language",
    author = "K R, Lekshmi  and
      V S, Jithesh  and
      Sherly, Elizabeth",
    editor = "Jha, Girish Nath  and
      Bali, Kalika  and
      L., Sobha  and
      Agrawal, S. S.  and
      Ojha, Atul Kr.",
    booktitle = "Proceedings of the WILDRE5{--} 5th Workshop on Indian Language Data: Resources and Evaluation",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://aclanthology.org/2020.wildre-1.5/",
    pages = "25--28",
    language = "eng",
    ISBN = "979-10-95546-67-2",
    abstract = "To overpass the disparity between theory and applications in language-related technology in the text as well as speech and several other areas, a well-designed and well-developed corpus is essential. Several problems and issues encountered while developing a corpus, especially for low resource languages. The Malayalam Speech Corpus (MSC) is one of the first open speech corpora for Automatic Speech Recognition (ASR) research to the best of our knowledge. It consists of 250 hours of Agricultural speech data. We are providing a transcription file, lexicon and annotated speech along with the audio segment. It is available in future for public use upon request at ``www.iiitmk.ac.in/vrclc/utilities/ml{\_}speechcorpus''. This paper details the development and collection process in the domain of agricultural speech corpora in the Malayalam Language."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="k-r-etal-2020-malayalam">
    <titleInfo>
        <title>Malayalam Speech Corpus: Design and Development for Dravidian Language</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Lekshmi</namePart>
        <namePart type="family">K R</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jithesh</namePart>
        <namePart type="family">V S</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Elizabeth</namePart>
        <namePart type="family">Sherly</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <language>
        <languageTerm type="text">eng</languageTerm>
    </language>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Girish</namePart>
            <namePart type="given">Nath</namePart>
            <namePart type="family">Jha</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kalika</namePart>
            <namePart type="family">Bali</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sobha</namePart>
            <namePart type="family">L.</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">S</namePart>
            <namePart type="given">S</namePart>
            <namePart type="family">Agrawal</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Atul</namePart>
            <namePart type="given">Kr.</namePart>
            <namePart type="family">Ojha</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>European Language Resources Association (ELRA)</publisher>
            <place>
                <placeTerm type="text">Marseille, France</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-10-95546-67-2</identifier>
    </relatedItem>
    <abstract>To overpass the disparity between theory and applications in language-related technology in the text as well as speech and several other areas, a well-designed and well-developed corpus is essential. Several problems and issues encountered while developing a corpus, especially for low resource languages. The Malayalam Speech Corpus (MSC) is one of the first open speech corpora for Automatic Speech Recognition (ASR) research to the best of our knowledge. It consists of 250 hours of Agricultural speech data. We are providing a transcription file, lexicon and annotated speech along with the audio segment. It is available in future for public use upon request at “www.iiitmk.ac.in/vrclc/utilities/ml_speechcorpus”. This paper details the development and collection process in the domain of agricultural speech corpora in the Malayalam Language.</abstract>
    <identifier type="citekey">k-r-etal-2020-malayalam</identifier>
    <location>
        <url>https://aclanthology.org/2020.wildre-1.5/</url>
    </location>
    <part>
        <date>2020-05</date>
        <extent unit="page">
            <start>25</start>
            <end>28</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Malayalam Speech Corpus: Design and Development for Dravidian Language
%A K R, Lekshmi
%A V S, Jithesh
%A Sherly, Elizabeth
%Y Jha, Girish Nath
%Y Bali, Kalika
%Y L., Sobha
%Y Agrawal, S. S.
%Y Ojha, Atul Kr.
%S Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
%D 2020
%8 May
%I European Language Resources Association (ELRA)
%C Marseille, France
%@ 979-10-95546-67-2
%G eng
%F k-r-etal-2020-malayalam
%X To overpass the disparity between theory and applications in language-related technology in the text as well as speech and several other areas, a well-designed and well-developed corpus is essential. Several problems and issues encountered while developing a corpus, especially for low resource languages. The Malayalam Speech Corpus (MSC) is one of the first open speech corpora for Automatic Speech Recognition (ASR) research to the best of our knowledge. It consists of 250 hours of Agricultural speech data. We are providing a transcription file, lexicon and annotated speech along with the audio segment. It is available in future for public use upon request at “www.iiitmk.ac.in/vrclc/utilities/ml_speechcorpus”. This paper details the development and collection process in the domain of agricultural speech corpora in the Malayalam Language.
%U https://aclanthology.org/2020.wildre-1.5/
%P 25-28

Download as File

Markdown (Informal)

[Malayalam Speech Corpus: Design and Development for Dravidian Language](https://aclanthology.org/2020.wildre-1.5/) (K R et al., WILDRE 2020)

Malayalam Speech Corpus: Design and Development for Dravidian Language (K R et al., WILDRE 2020)

ACL

Lekshmi K R, Jithesh V S, and Elizabeth Sherly. 2020. Malayalam Speech Corpus: Design and Development for Dravidian Language. In Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pages 25–28, Marseille, France. European Language Resources Association (ELRA).