Nikita Neveditsin


2024

pdf bib
Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models
Nikita Neveditsin | Ambuja Salgaonkar | Pawan Lingras | Vijay Mago
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

This study assesses the ability of machine learning to classify verses from Buddhist texts into two categories: Therigatha and Theragatha, attributed to female and male authors, respectively. It highlights the difficulties in data preprocessing and the use of Transformer-based models on Devanagari script due to limited vocabulary, demonstrating that simple statistical models can be equally effective. The research suggests areas for future exploration, provides the dataset for further study, and acknowledges existing limitations and challenges.