2024
pdf
bib
abs
The Vedic Compound Dataset
Sven Sellmer
|
Oliver Hellwig
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
This paper introduces the Vedic Compound Dataset (VCD), the first resource providing annotated compounds from Vedic Sanskrit, a South Asian Indo-European language used from ca. 1500 to 500 BCE. The VCD aims at facilitating the study of language change in early Indo-Iranian and offers comparative material for quantitative cross-linguistic research on compounds. The process of annotating Vedic compounds is complex as they contain five of the six basic types of compounds defined by Scalise & Bisetto (2005), which are, however, not consistently marked in morphosyntax, making their automatic classification a significant challenge. The paper details the process of collecting and preprocessing the relevant data, with a particular focus on the question of how to distinguish exocentric from endocentric usage. It further discusses experiments with a simple ML classifier that uses compound internal syntactic relations, outlines the composition of the dataset, and sketches directions for future research.
2023
pdf
bib
The Vedic corpus as a graph. An updated version of Bloomfields Vedic Concordance
Oliver Hellwig
|
Sven Sellmer
|
Kyoko Amano
Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference
pdf
bib
abs
Hedging in diachrony: the case of Vedic Sanskrit iva
Erica Biagetti
|
Oliver Hellwig
|
Sven Sellmer
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
The rhetoric strategy of hedging serves to attenuate speech acts and their semantic content, as in English ‘kind of’ or ‘somehow’. While hedging has recently met with increasing interest in linguistic research, most studies deal with modern languages, preferably English, and take a synchronic approach. This paper complements this research by tracing the diachronic syntactic flexibilization of the Vedic Sanskrit particle iva from a marker of comparison (‘like’) to a full-fledged adaptor. We discuss the outcomes of a diachronic Bayesian framework applied to iva constructions in a Universal Dependencies treebank, and supplement these results with a qualitative discussion of relevant text passages.
2022
pdf
bib
abs
Detecting Diachronic Syntactic Developments in Presence of Bias Terms
Oliver Hellwig
|
Sven Sellmer
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Corpus-based studies of diachronic syntactic changes are typically guided by the results of previous qualitative research. When such results are missing or, as is the case for Vedic Sanskrit, are restricted to small parts of a transmitted corpus, an exploratory framework that detects such changes in a data-driven fashion can substantially support the research process. In this paper, we introduce a customized version of the infinite relational model that groups syntactic constituents based on their structural similarities and their diachronic distributions. We propose a simple way to control for register and intellectual affiliation, and discuss our findings for four syntactic structures in Vedic texts.