Testing and Adapting the Representational Abilities of Large Language Models on Folktales in Low-Resource Languages

J. A. Meaney, Beatrice Alex, William Lamb


Abstract
Folktales are a rich resource of knowledge about the society and culture of a civilisation. Digital folklore research aims to use automated techniques to better understand these folktales, and it relies on abstract representations of the textual data. Although a number of large language models (LLMs) claim to be able to represent low-resource langauges such as Irish and Gaelic, we present two classification tasks to explore how useful these representations are, and three adaptations to improve the performance of these models. We find that adapting the models to work with longer sequences, and continuing pre-training on the domain of folktales improves classification performance, although these findings are tempered by the impressive performance of a baseline SVM with non-contextual features.
Anthology ID:
2024.nlp4dh-1.31
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
319–324
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.31
DOI:
Bibkey:
Cite (ACL):
J. A. Meaney, Beatrice Alex, and William Lamb. 2024. Testing and Adapting the Representational Abilities of Large Language Models on Folktales in Low-Resource Languages. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 319–324, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
Testing and Adapting the Representational Abilities of Large Language Models on Folktales in Low-Resource Languages (Meaney et al., NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.31.pdf