Leveraging RAG for a Low-Resource Audio-Aware Diachronic Analysis of Gendered Toy Marketing

Luca Marinelli, Iacopo Ghinassi, Charalampos Saitis


Abstract
We performed a diachronic analysis of sound and language in toy commercials, leveraging retrieval-augmented generation (RAG) and open-weight language models in low-resource settings. A pool of 2508 UK toy advertisements spanning 14 years was semi-automatically annotated, integrating thematic coding of transcripts with audio annotation. With our RAG pipeline, we thematically coded and classified commercials by gender-target audience (feminine, masculine, or mixed) achieving substantial inter-coder reliability. In parallel, a music-focused multitask model was applied to annotate affective and mid-level musical perceptual attributes, enabling multimodal discourse analysis. Our findings reveal significant diachronic shifts and enduring patterns. Soundtracks classified as energizing registered an overall increase across distinct themes and audiences, but such increase was steeper for masculine-adjacent commercials. Moreover, themes stereotypically associated with masculinity paired more frequently with louder, distorted, and aggressive music, while stereotypically feminine themes with softer, calmer, and more harmonious soundtracks. Code and data to reproduce the results are available on github.com/marinelliluca/low-resource-RAG.
Anthology ID:
2025.lm4dh-1.9
Volume:
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Isuri Nanomi Arachchige, Francesca Frontini, Ruslan Mitkov, Paul Rayson
Venues:
LM4DH | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
102–111
Language:
URL:
https://aclanthology.org/2025.lm4dh-1.9/
DOI:
Bibkey:
Cite (ACL):
Luca Marinelli, Iacopo Ghinassi, and Charalampos Saitis. 2025. Leveraging RAG for a Low-Resource Audio-Aware Diachronic Analysis of Gendered Toy Marketing. In Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities, pages 102–111, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Leveraging RAG for a Low-Resource Audio-Aware Diachronic Analysis of Gendered Toy Marketing (Marinelli et al., LM4DH 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.lm4dh-1.9.pdf