Sripathi Sripada


2021

pdf bib
lakṣyārtha (Indicated Meaning) of Śabdavyāpāra (Function of a Word) framework from kāvyaśāstra (The Science of Literary Studies) in Samskṛtam : Its application to Literary Machine Translation and other NLP tasks
Sripathi Sripada | Anupama Ryali | Raghuram Sheshadri
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

A key challenge in Literary Machine Translation is that the meaning of a sentence can be different from the sum of meanings of all the words it possesses. This poses the problem of requiring large amounts of consistently labelled training data across a variety of usages and languages. In this paper, we propose that we can economically train machine translation models to identify and paraphrase such sentences by leveraging the language independent framework of Śabdavyāpāra (Function of a Word), from Literary Sciences in Saṃskṛtam, and its definition of lakṣyārtha (‘Indicated’ meaning). An Indicated meaning exists where there is incompatibility among the literal meanings of the words in a sentence (irrespective of language). The framework defines seven categories of Indicated meaning and their characteristics. As a pilot, we identified 300 such sentences from literary and regular usage, labelled them and trained a 2d Convolutional Neural Network to categorise a sentence based on the category of Indicated meaning and finetuned a T5 to paraphrase them. We compared these paraphrased sentences with those paraphrased by a T5 finetuned on Quora Paraphrase dataset of 400,000 sentence pairs. The T5 finetuned on the Indicated meaning examples performed consistently better. Moreover, a Google Translate translates these paraphrased sentences accurately and consistently across languages