Describing Language Variation in the Colophons of Armenian Manuscripts

Bastien Kindt; Emmanuel Van Elverdinghe

Describing Language Variation in the Colophons of Armenian Manuscripts

Abstract

The colophons of Armenian manuscripts constitute a large textual corpus spanning a millennium of written culture. These texts are highly diverse and rich in terms of linguistic variation. This poses a challenge to NLP tools, especially considering the fact that linguistic resources designed or suited for Armenian are still scarce. In this paper, we deal with a sub-corpus of colophons written to commemorate the rescue of a manuscript and dating from 1286 to ca. 1450, a thematic group distinguished by a particularly high concentration of words exhibiting linguistic variation. The text is processed (lemmatization, POS-tagging, and inflectional tagging) using the tools of the GREgORI Project and evaluated. Through a selection of examples, we show how variation is dealt with at each linguistic level (phonology, orthography, flexion, vocabulary, syntax). Complex variation, at the level of tokens or lemmata, is considered as well. The results of this work are used to enrich and refine the linguistic resources of the GREgORI project, which in turn benefits the processing of other texts.

Anthology ID:: 2022.digitam-1.4
Volume:: Proceedings of the Workshop on Processing Language Variation: Digital Armenian (DigitAm) within the 13th Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Victoria Khurshudyan, Nadi Tomeh, Damien Nouvel, Anaid Donabedian, Chahan Vidal-Gorene
Venue:: DigitAm
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 20–27
Language:
URL:: https://aclanthology.org/2022.digitam-1.4/
DOI:
Bibkey:
Cite (ACL):: Bastien Kindt and Emmanuel Van Elverdinghe. 2022. Describing Language Variation in the Colophons of Armenian Manuscripts. In Proceedings of the Workshop on Processing Language Variation: Digital Armenian (DigitAm) within the 13th Language Resources and Evaluation Conference, pages 20–27, Marseille, France. European Language Resources Association.
Cite (Informal):: Describing Language Variation in the Colophons of Armenian Manuscripts (Kindt & Van Elverdinghe, DigitAm 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.digitam-1.4.pdf

PDF Cite Search Fix data