Sanskrit Sandhi Splitting using seq2(seq)2

Rahul Aralikatte; Neelamadhav Gantayat; Naveen Panwar; Anush Sankaran; Senthil Mani

doi:10.18653/v1/D18-1530

Sanskrit Sandhi Splitting using seq2(seq)2

Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran, Senthil Mani

Abstract

In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well.

Anthology ID:: D18-1530
Volume:: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:: October-November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4909–4914
Language:
URL:: https://aclanthology.org/D18-1530/
DOI:: 10.18653/v1/D18-1530
Bibkey:
Cite (ACL):: Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran, and Senthil Mani. 2018. Sanskrit Sandhi Splitting using seq2(seq)2. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4909–4914, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Sanskrit Sandhi Splitting using seq2(seq)2 (Aralikatte et al., EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/D18-1530.pdf
Attachment:: D18-1530.Attachment.zip

PDF Cite Search Attachment Fix data