Compilation of a Synthetic Judeo-French Corpus

Iglika Nikolova-Stoupak, Gaél Lejeune, Eva Schaeffer-Lacroix


Abstract
This is a short paper describing the process of derivation of synthetic Judeo-French text. Judeo-French is one of a number of rare languages used in speaking and writing by Jewish communities as confined to a particular temporal and geographical frame (in this case, 11th- to 14th-century France). The number of resources in the language is very limited and its involvement in the contemporary domain of Natural Language Processing (NLP) is practically non-existent. This work outlines the compilation of a synthetic Judeo-French corpus. For the purpose, a pipeline of transformations is applied to Old French text belonging to the same general time period, leading to the derivation of text that is as reliable as possible in terms of phonological, morphological and lexical characteristics as witnessed in Judeo-French. Ultimately, the goal is for this synthetic corpus to be used in standard NLP tasks, such as Neural Machine Translation (NMT), as an instance of data augmentation.
Anthology ID:
2024.latechclfl-1.5
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–45
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.5
DOI:
Bibkey:
Cite (ACL):
Iglika Nikolova-Stoupak, Gaél Lejeune, and Eva Schaeffer-Lacroix. 2024. Compilation of a Synthetic Judeo-French Corpus. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 40–45, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Compilation of a Synthetic Judeo-French Corpus (Nikolova-Stoupak et al., LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.5.pdf