%0 Conference Proceedings %T Nepali Encoder Transformers: An Analysis of Auto Encoding Transformer Language Models for Nepali Text Classification %A Maskey, Utsav %A Bhatta, Manish %A Bhatt, Shiva %A Dhungel, Sanket %A Bal, Bal Krishna %Y Melero, Maite %Y Sakti, Sakriani %Y Soria, Claudia %S Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages %D 2022 %8 June %I European Language Resources Association %C Marseille, France %F maskey-etal-2022-nepali %X Language model pre-training has significantly impacted NLP and resulted in performance gains on many NLP-related tasks, but comparative study of different approaches on many low-resource languages seems to be missing. This paper attempts to investigate appropriate methods for pretraining a Transformer-based model for the Nepali language. We focus on the language-specific aspects that need to be considered for modeling. Although some language models have been trained for Nepali, the study is far from sufficient. We train three distinct Transformer-based masked language models for Nepali text sequences: distilbert-base (Sanh et al., 2019) for its efficiency and minuteness, deberta-base (P. He et al., 2020) for its capability of modeling the dependency of nearby token pairs and XLM-ROBERTa (Conneau et al., 2020) for its capabilities to handle multilingual downstream tasks. We evaluate and compare these models with other Transformer-based models on a downstream classification task with an aim to suggest an effective strategy for training low-resource language models and their fine-tuning. %U https://aclanthology.org/2022.sigul-1.14 %P 106-111