SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

Jivnesh Sandhan, Anshul Agarwal, Laxmidhar Behera, Tushar Sandhan, Pawan Goyal


Abstract
We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritShala is deployed as a web-based application, which allows a user to get real-time analysis for the given input. It is built with easy-to-use interactive data annotation features that allow annotators to correct the system predictions when it makes mistakes. We publicly release the source codes of the 4 modules included in the toolkit, 7 word embedding models that have been trained on publicly available Sanskrit corpora and multiple annotated datasets such as word similarity, relatedness, categorization, analogy prediction to assess intrinsic properties of word embeddings. So far as we know, this is the first neural-based Sanskrit NLP toolkit that has a web-based interface and a number of NLP modules. We are sure that the people who are willing to work with Sanskrit will find it useful for pedagogical and annotative purposes. SanskritShala is available at: https://cnerg.iitkgp.ac.in/sanskritshala. The demo video of our platform can be accessed at: https://youtu.be/x0X31Y9k0mw4.
Anthology ID:
2023.acl-demo.10
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Danushka Bollegala, Ruihong Huang, Alan Ritter
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–112
Language:
URL:
https://aclanthology.org/2023.acl-demo.10
DOI:
10.18653/v1/2023.acl-demo.10
Bibkey:
Cite (ACL):
Jivnesh Sandhan, Anshul Agarwal, Laxmidhar Behera, Tushar Sandhan, and Pawan Goyal. 2023. SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 103–112, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes (Sandhan et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-demo.10.pdf
Video:
 https://aclanthology.org/2023.acl-demo.10.mp4