SHR++: An Interface for Morpho-syntactic Annotation of Sanskrit Corpora

Amrith Krishna, Shiv Vidhyut, Dilpreet Chawla, Sruti Sambhavi, Pawan Goyal


Abstract
We propose a web-based annotation framework, SHR++, for morpho-syntactic annotation of corpora in Sanskrit. SHR++ is designed to generate annotations for the word-segmentation, morphological parsing and dependency analysis tasks in Sanskrit. It incorporates analyses and predictions from various tools designed for processing texts in Sanskrit, and utilise them to ease the cognitive load of the human annotators. Specifically, SHR++ uses Sanskrit Heritage Reader, a lexicon driven shallow parser for enumerating all the phonetically and lexically valid word splits along with their morphological analyses for a given string. This would help the annotators in choosing the solutions, rather than performing the segmentations by themselves. Further, predictions from a word segmentation tool are added as suggestions that can aid the human annotators in their decision making. Our evaluation shows that enabling this segmentation suggestion component reduces the annotation time by 20.15 %. SHR++ can be accessed online at http://vidhyut97.pythonanywhere.com/ and the codebase, for the independent deployment of the system elsewhere, is hosted at https://github.com/iamdsc/smart-sanskrit-annotator.
Anthology ID:
2020.lrec-1.874
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7069–7076
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.874
DOI:
Bibkey:
Cite (ACL):
Amrith Krishna, Shiv Vidhyut, Dilpreet Chawla, Sruti Sambhavi, and Pawan Goyal. 2020. SHR++: An Interface for Morpho-syntactic Annotation of Sanskrit Corpora. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7069–7076, Marseille, France. European Language Resources Association.
Cite (Informal):
SHR++: An Interface for Morpho-syntactic Annotation of Sanskrit Corpora (Krishna et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.874.pdf
Code
 iamdsc/smart-sanskrit-annotator