Yuki Kyogoku


2024

pdf bib
Exploring Similarity Measures and Intertextuality in Vedic Sanskrit Literature
So Miyagawa | Yuki Kyogoku | Yuzuki Tsukagoshi | Kyoko Amano
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

This paper examines semantic similarity and intertextuality in selected texts from the Vedic Sanskrit corpus, specifically the Maitrāyaṇī Saṃhitā (MS) and Kāṭhaka-Saṃhitā (KS). Three computational methods are employed: Word2Vec for word embeddings, stylo package for stylometric analysis, and TRACER for text reuse detection. By comparing various sections of the texts at different granularities, patterns of similarity and structural alignment are uncovered, providing insights into textual relationships and chronology. Word embeddings capture semantic similarities, while stylometric analysis reveals clusters and components that differentiate the texts. TRACER identifies parallel passages, indicating probable instances of text reuse. The computational analysis corroborates previous philological studies, suggesting a shared period of composition between MS.1.9 and MS.1.7. This research highlights the potential of computational methods in studying ancient Sanskrit literature, complementing traditional approaches. The agreement among the methods strengthens the validity of the findings, and the visualizations offer a nuanced understanding of textual connections. The study demonstrates that smaller chunk sizes are more effective for detecting intertextual parallels, showcasing the power of these techniques in unraveling the complexities of ancient texts.