Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset
Bolat Tleubayev, Zhanel Zhexenova, Kenessary Koishybay, Anara Sandygulova
Abstract
This paper presents a new handwritten dataset, Cyrillic-MNIST, a Cyrillic version of the MNIST dataset, comprising of 121,234 samples of 42 Cyrillic letters. The performance of Cyrillic-MNIST is evaluated using standard deep learning approaches and is compared to the Extended MNIST (EMNIST) dataset. The dataset is available at https://github.com/bolattleubayev/cmnist- Anthology ID:
- 2022.lrec-1.510
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4767–4773
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.510
- DOI:
- Bibkey:
- Cite (ACL):
- Bolat Tleubayev, Zhanel Zhexenova, Kenessary Koishybay, and Anara Sandygulova. 2022. Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4767–4773, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset (Tleubayev et al., LREC 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.lrec-1.510.pdf
- Data
- How2Sign
Export citation
@inproceedings{tleubayev-etal-2022-cyrillic, title = "{C}yrillic-{MNIST}: a {C}yrillic Version of the {MNIST} Dataset", author = "Tleubayev, Bolat and Zhexenova, Zhanel and Koishybay, Kenessary and Sandygulova, Anara", editor = "Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Odijk, Jan and Piperidis, Stelios", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.lrec-1.510", pages = "4767--4773", abstract = "This paper presents a new handwritten dataset, Cyrillic-MNIST, a Cyrillic version of the MNIST dataset, comprising of 121,234 samples of 42 Cyrillic letters. The performance of Cyrillic-MNIST is evaluated using standard deep learning approaches and is compared to the Extended MNIST (EMNIST) dataset. The dataset is available at \url{https://github.com/bolattleubayev/cmnist}", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="tleubayev-etal-2022-cyrillic"> <titleInfo> <title>Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset</title> </titleInfo> <name type="personal"> <namePart type="given">Bolat</namePart> <namePart type="family">Tleubayev</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhanel</namePart> <namePart type="family">Zhexenova</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kenessary</namePart> <namePart type="family">Koishybay</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anara</namePart> <namePart type="family">Sandygulova</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-06</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Thirteenth Language Resources and Evaluation Conference</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Frédéric</namePart> <namePart type="family">Béchet</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philippe</namePart> <namePart type="family">Blache</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Khalid</namePart> <namePart type="family">Choukri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christopher</namePart> <namePart type="family">Cieri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Thierry</namePart> <namePart type="family">Declerck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sara</namePart> <namePart type="family">Goggi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hitoshi</namePart> <namePart type="family">Isahara</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bente</namePart> <namePart type="family">Maegaard</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joseph</namePart> <namePart type="family">Mariani</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hélène</namePart> <namePart type="family">Mazo</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Odijk</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Stelios</namePart> <namePart type="family">Piperidis</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Language Resources Association</publisher> <place> <placeTerm type="text">Marseille, France</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper presents a new handwritten dataset, Cyrillic-MNIST, a Cyrillic version of the MNIST dataset, comprising of 121,234 samples of 42 Cyrillic letters. The performance of Cyrillic-MNIST is evaluated using standard deep learning approaches and is compared to the Extended MNIST (EMNIST) dataset. The dataset is available at https://github.com/bolattleubayev/cmnist</abstract> <identifier type="citekey">tleubayev-etal-2022-cyrillic</identifier> <location> <url>https://aclanthology.org/2022.lrec-1.510</url> </location> <part> <date>2022-06</date> <extent unit="page"> <start>4767</start> <end>4773</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset %A Tleubayev, Bolat %A Zhexenova, Zhanel %A Koishybay, Kenessary %A Sandygulova, Anara %Y Calzolari, Nicoletta %Y Béchet, Frédéric %Y Blache, Philippe %Y Choukri, Khalid %Y Cieri, Christopher %Y Declerck, Thierry %Y Goggi, Sara %Y Isahara, Hitoshi %Y Maegaard, Bente %Y Mariani, Joseph %Y Mazo, Hélène %Y Odijk, Jan %Y Piperidis, Stelios %S Proceedings of the Thirteenth Language Resources and Evaluation Conference %D 2022 %8 June %I European Language Resources Association %C Marseille, France %F tleubayev-etal-2022-cyrillic %X This paper presents a new handwritten dataset, Cyrillic-MNIST, a Cyrillic version of the MNIST dataset, comprising of 121,234 samples of 42 Cyrillic letters. The performance of Cyrillic-MNIST is evaluated using standard deep learning approaches and is compared to the Extended MNIST (EMNIST) dataset. The dataset is available at https://github.com/bolattleubayev/cmnist %U https://aclanthology.org/2022.lrec-1.510 %P 4767-4773
Markdown (Informal)
[Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset](https://aclanthology.org/2022.lrec-1.510) (Tleubayev et al., LREC 2022)
- Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset (Tleubayev et al., LREC 2022)
ACL
- Bolat Tleubayev, Zhanel Zhexenova, Kenessary Koishybay, and Anara Sandygulova. 2022. Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4767–4773, Marseille, France. European Language Resources Association.