Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models

Nikita Neveditsin, Ambuja Salgaonkar, Pawan Lingras, Vijay Mago


Abstract
This study assesses the ability of machine learning to classify verses from Buddhist texts into two categories: Therigatha and Theragatha, attributed to female and male authors, respectively. It highlights the difficulties in data preprocessing and the use of Transformer-based models on Devanagari script due to limited vocabulary, demonstrating that simple statistical models can be equally effective. The research suggests areas for future exploration, provides the dataset for further study, and acknowledges existing limitations and challenges.
Anthology ID:
2024.nlp4dh-1.37
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
377–385
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.37
DOI:
Bibkey:
Cite (ACL):
Nikita Neveditsin, Ambuja Salgaonkar, Pawan Lingras, and Vijay Mago. 2024. Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 377–385, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models (Neveditsin et al., NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.37.pdf